The Rashomon Effect describes the following phenomenon: for a given dataset
there may exist many models with equally good performance but with different
solution strategies. The Rashomon Effect has implications for Explainable
Machine Learning, especially for the comparability of explanations. We provide
a unified view on three different comparison scenarios and conduct a
quantitative evaluation across different datasets, models, attribution methods,
and metrics. We find that hyperparameter-tuning plays a role and that metric
selection matters. Our results provide empirical support for previously
anecdotal evidence and exhibit challenges for both scientists and
practitioners.
( 2
min )
Tensor network (TN) representation is a powerful technique for data analysis
and machine learning. It practically involves a challenging TN structure search
(TN-SS) problem, which aims to search for the optimal structure to achieve a
compact representation. Existing TN-SS methods mainly adopt a bi-level
optimization method that leads to excessive computational costs due to repeated
structure evaluations. To address this issue, we propose an efficient
integrated (single-level) method named SVD-inspired TN decomposition
(SVDinsTN), eliminating the need for repeated tedious structure evaluation. By
inserting a diagonal factor for each edge of the fully-connected TN, we
calculate TN cores and diagonal factors simultaneously, with factor sparsity
revealing the most compact TN structure. Experimental results on real-world
data demonstrate that SVDinsTN achieves approximately $10\sim{}10^3$ times
acceleration in runtime compared to the existing TN-SS methods while
maintaining a comparable level of representation ability.
( 2
min )
We study the problem of learning mixtures of Gaussians with censored data.
Statistical learning with censored data is a classical problem, with numerous
practical applications, however, finite-sample guarantees for even simple
latent variable models such as Gaussian mixtures are missing. Formally, we are
given censored data from a mixture of univariate Gaussians $$
\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed
only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and
the means $\mu_i$. We propose an algorithm that takes only
$\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the
means $\mu_i$ within $\varepsilon$ error.
( 2
min )
Energy time-series analysis describes the process of analyzing past energy
observations and possibly external factors so as to predict the future.
Different tasks are involved in the general field of energy time-series
analysis and forecasting, with electric load demand forecasting, personalized
energy consumption forecasting, as well as renewable energy generation
forecasting being among the most common ones. Following the exceptional
performance of Deep Learning (DL) in a broad area of vision tasks, DL models
have successfully been utilized in time-series forecasting tasks. This paper
aims to provide insight into various DL methods geared towards improving the
performance in energy time-series forecasting tasks, with special emphasis in
Greek Energy Market, and equip the reader with the necessary knowledge to apply
these methods in practice.
( 2
min )
This research paper focuses on the implementation of radial Basis Function
(RBF) Support Vector Machines (SVM) for classifying asteroid orbits. Asteroids
are important astronomical objects, and their orbits play a crucial role in
understanding the dynamics of the solar system. The International Astronomical
Union maintains data archives that provide a playground to experiment with
various machine-learning techniques. In this study, we explore the application
of RBF SVM algorithm to classify asteroids. The results show that the RBF SVM
algorithm provides a good efficiency and accuracy to the dataset. We also
analyze the impact of various parameters on the performance of the RBF SVM
algorithm and present the optimal parameter settings. Our study highlights the
importance of using machine learning techniques for classifying asteroid orbits
and the effectiveness of the RBF SVM algorithm in this regard.
( 2
min )
Medical image segmentation is particularly critical as a prerequisite for
relevant quantitative analysis in the treatment of clinical diseases. For
example, in clinical cervical cancer radiotherapy, after acquiring subabdominal
MRI images, a fast and accurate image segmentation of organs and tumors in MRI
images can optimize the clinical radiotherapy process, whereas traditional
approaches use manual annotation by specialist doctors, which is time-consuming
and laborious, therefore, automatic organ segmentation of subabdominal MRI
images is a valuable research topic.
( 2
min )
Despite the recent development in machine learning, most learning systems are
still under the concept of "black box", where the performance cannot be
understood and derived. With the rise of safety and privacy concerns in public,
designing an explainable learning system has become a new trend in machine
learning. In general, many machine learning problems are formulated as
minimizing (or maximizing) some loss function. Since real data are most likely
generated from non-linear models, the loss function is non-convex in general.
Unlike the convex optimization problem, gradient descent algorithms will be
trapped in spurious local minima in solving non-convex optimization. Therefore,
it is challenging to provide explainable algorithms when studying non-convex
optimization problems. In this thesis, two popular non-convex problems are
studied: (1) low-rank matrix completion and (2) neural network learning.
( 2
min )
A dynamical system produces a dependent multivariate sequence called
dynamical time series, developed with an evolution function. As variables in
the dynamical time series at the current time-point usually depend on the whole
variables in the previous time-point, existing studies forecast the variables
at the future time-point by estimating the evolution function. However, some
variables in the dynamical time-series are missing in some practical
situations. In this study, we propose an autoregressive with slack time series
(ARS) model. ARS model involves the simultaneous estimation of the evolution
function and the underlying missing variables as a slack time series, with the
aid of the time-invariance and linearity of the dynamical system. This study
empirically demonstrates the effectiveness of the proposed ARS model.
( 2
min )
In this paper we demonstrate both theoretically as well as numerically that
neural networks can detect model-free static arbitrage opportunities whenever
the market admits some. Due to the use of neural networks, our method can be
applied to financial markets with a high number of traded securities and
ensures almost immediate execution of the corresponding trading strategies. To
demonstrate its tractability, effectiveness, and robustness we provide examples
using real financial data. From a technical point of view, we prove that a
single neural network can approximately solve a class of convex semi-infinite
programs, which is the key result in order to derive our theoretical results
that neural networks can detect model-free static arbitrage strategies whenever
the financial market admits such opportunities.
( 2
min )
Machine learning (ML) and tensor-based methods have been of significant
interest for the scientific community for the last few decades. In a previous
work we presented a novel tensor-based system identification framework to ease
the computational burden of tensor-only architectures while still being able to
achieve exceptionally good performance. However, the derived approach only
allows to process real-valued problems and is therefore not directly applicable
on a wide range of signal processing and communications problems, which often
deal with complex-valued systems. In this work we therefore derive two new
architectures to allow the processing of complex-valued signals, and show that
these extensions are able to surpass the trivial, complex-valued extension of
the original architecture in terms of performance, while only requiring a
slight overhead in computational resources to allow for complex-valued
operations.
( 2
min )
Numerical data imputation algorithms replace missing values by estimates to
leverage incomplete data sets. Current imputation methods seek to minimize the
error between the unobserved ground truth and the imputed values. But this
strategy can create artifacts leading to poor imputation in the presence of
multimodal or complex distributions. To tackle this problem, we introduce the
$k$NN$\times$KDE algorithm: a data imputation method combining nearest neighbor
estimation ($k$NN) and density estimation with Gaussian kernels (KDE). We
compare our method with previous data imputation methods using artificial and
real-world data with different data missing scenarios and various data missing
rates, and show that our method can cope with complex original data structure,
yields lower data imputation errors, and provides probabilistic estimates with
higher likelihood than current methods. We release the code in open-source for
the community: https://github.com/DeltaFloflo/knnxkde
( 2
min )
A dynamical system produces a dependent multivariate sequence called
dynamical time series, developed with an evolution function. As variables in
the dynamical time series at the current time-point usually depend on the whole
variables in the previous time-point, existing studies forecast the variables
at the future time-point by estimating the evolution function. However, some
variables in the dynamical time-series are missing in some practical
situations. In this study, we propose an autoregressive with slack time series
(ARS) model. ARS model involves the simultaneous estimation of the evolution
function and the underlying missing variables as a slack time series, with the
aid of the time-invariance and linearity of the dynamical system. This study
empirically demonstrates the effectiveness of the proposed ARS model.
( 2
min )
We study the problem of learning mixtures of Gaussians with censored data.
Statistical learning with censored data is a classical problem, with numerous
practical applications, however, finite-sample guarantees for even simple
latent variable models such as Gaussian mixtures are missing. Formally, we are
given censored data from a mixture of univariate Gaussians $$
\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed
only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and
the means $\mu_i$. We propose an algorithm that takes only
$\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the
means $\mu_i$ within $\varepsilon$ error.
( 2
min )
This paper revisits an adaptation of the random forest algorithm for
Fr\'echet regression, addressing the challenge of regression in the context of
random objects in metric spaces. Recognizing the limitations of previous
approaches, we introduce a new splitting rule that circumvents the
computationally expensive operation of Fr\'echet means by substituting with a
medoid-based approach. We validate this approach by demonstrating its
asymptotic equivalence to Fr\'echet mean-based procedures and establish the
consistency of the associated regression estimator. The paper provides a sound
theoretical framework and a more efficient computational approach to Fr\'echet
regression, broadening its application to non-standard data types and complex
use cases.
( 2
min )
Wasserstein gradient flows on probability measures have found a host of
applications in various optimization problems. They typically arise as the
continuum limit of exchangeable particle systems evolving by some mean-field
interaction involving a gradient-type potential. However, in many problems,
such as in multi-layer neural networks, the so-called particles are edge
weights on large graphs whose nodes are exchangeable. Such large graphs are
known to converge to continuum limits called graphons as their size grow to
infinity. We show that the Euclidean gradient flow of a suitable function of
the edge-weights converges to a novel continuum limit given by a curve on the
space of graphons that can be appropriately described as a gradient flow or,
more technically, a curve of maximal slope. Several natural functions on
graphons, such as homomorphism functions and the scalar entropy, are covered by
our set-up, and the examples have been worked out in detail.
( 2
min )
This paper proposes a multi-object tracking (MOT) algorithm for traffic
monitoring using a drone equipped with optical and thermal cameras. Object
detections on the images are obtained using a neural network for each type of
camera. The cameras are modelled as direction-of-arrival (DOA) sensors. Each
DOA detection follows a von-Mises Fisher distribution, whose mean direction is
obtain by projecting a vehicle position on the ground to the camera. We then
use the trajectory Poisson multi-Bernoulli mixture filter (TPMBM), which is a
Bayesian MOT algorithm, to optimally estimate the set of vehicle trajectories.
We have also developed a parameter estimation algorithm for the measurement
model. We have tested the accuracy of the resulting TPMBM filter in synthetic
and experimental data sets.
( 2
min )
We propose a new nonparametric modeling framework for causal inference when
outcomes depend on how agents are linked in a social or economic network. Such
network interference describes a large literature on treatment spillovers,
social interactions, social learning, information diffusion, disease and
financial contagion, social capital formation, and more. Our approach works by
first characterizing how an agent is linked in the network using the
configuration of other agents and connections nearby as measured by path
distance. The impact of a policy or treatment assignment is then learned by
pooling outcome data across similarly configured agents. We demonstrate the
approach by proposing an asymptotically valid test for the hypothesis of policy
irrelevance/no treatment effects and bounding the mean-squared error of a
k-nearest-neighbor estimator for the average or distributional policy
effect/treatment response.
( 2
min )
We study causal inference and efficient estimation for the expected number of
recurrent events in the presence of a terminal event. We define our estimand as
the vector comprising both the expected number of recurrent events and the
failure survival function evaluated along a sequence of landmark times. We
identify the estimand in the presence of right-censoring and causal selection
as an observed data functional under coarsening at random, derive the
nonparametric efficiency bound, and propose a multiply-robust estimator that
achieves the bound and permits nonparametric estimation of nuisance parameters.
Throughout, no absolute continuity assumption is made on the underlying
probability distributions of failure, censoring, or the observed data.
Additionally, we derive the class of influence functions when the coarsening
distribution is known and review how published estimators may belong to the
class. Along the way, we highlight some interesting inconsistencies in the
causal lifetime analysis literature.
( 2
min )
In this paper we demonstrate both theoretically as well as numerically that
neural networks can detect model-free static arbitrage opportunities whenever
the market admits some. Due to the use of neural networks, our method can be
applied to financial markets with a high number of traded securities and
ensures almost immediate execution of the corresponding trading strategies. To
demonstrate its tractability, effectiveness, and robustness we provide examples
using real financial data. From a technical point of view, we prove that a
single neural network can approximately solve a class of convex semi-infinite
programs, which is the key result in order to derive our theoretical results
that neural networks can detect model-free static arbitrage strategies whenever
the financial market admits such opportunities.
( 2
min )
I've noted before that because AI detectors produce false positives, it's unethical to use them to detect cheating.
Now there's a new study that shows it's even worse. Not only do AI detectors falsely flag human-written text as AI-written, the way in
( 6
min )
A new dataset can help scientists develop automatic systems that generate richer, more descriptive captions for online charts.
( 10
min )
Cost of poor quality is top of mind for manufacturers. Quality defects increase scrap and rework costs, decrease throughput, and can impact customers and company reputation. Quality inspection on the production line is crucial for maintaining quality standards. In many cases, human visual inspection is used to assess the quality and detect defects, which can […]
( 10
min )
Not long ago, I published an article entitled “The Sound that Data Makes”. The goal was turning data — random noise in this case — into music. The hope was that by “listening” to your data, you could gain a different kind of insights, not conveyed by visualizations or tabular summaries. This article is a… Read More »The music of the Riemann Hypothesis: Sound Generation in Python
The post The music of the Riemann Hypothesis: Sound Generation in Python appeared first on Data Science Central.
( 22
min )
Organizations are continuously investing time and effort in developing intelligent recommendation solutions to serve customized and relevant content to their users. The goals can be many: transform the user experience, generate meaningful interaction, and drive content consumption. Some of these solutions use common machine learning (ML) models built on historical interaction patterns, user demographic attributes, […]
( 10
min )
Fine-tuning large language models (LLMs) allows you to adjust open-source foundational models to achieve improved performance on your domain-specific tasks. In this post, we discuss the advantages of using Amazon SageMaker notebooks to fine-tune state-of-the-art open-source models. We utilize Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization techniques through bitsandbytes to support interactive fine-tuning of […]
( 9
min )
Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension, seamlessly consolidate information from a wide range of sources, and enable strong immersion in human-AI interactions. This could transform the […]
The post Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation appeared first on Microsoft Research.
( 10
min )
MUE Studio, founded by 3D artists Minjin Kang and Mijoo Kim, specializes in art direction, photography and 3D design for campaigns and installations.
( 7
min )
It’s a jam-packed July with 14 newly supported titles in the GeForce NOW library, including Remnant II from Gunfire Games and Gearbox Publishing. Need a new adventure? Check out the nine additions streaming from the cloud this week. Plus, the Steam Summer Sale kicks off this week, and many supported titles in the GeForce NOW Read article >
( 5
min )
In the first part of this blog, we discussed how coding could be a collaborative experience using tools like the GitHub Copilot. In the second part, we will explore the impact and significance of collaboration due to GitHub Copilot on the wider developer ecosystem. As we have seen, developers have a specific definition of collaboration… Read More »Will coding be a collaborative experience using GitHub Copilot? – Part two
The post Will coding be a collaborative experience using GitHub Copilot? – Part two appeared first on Data Science Central.
( 21
min )
The world of higher education is undergoing a transformative shift as artificial intelligence (AI) continues to reshape various aspects of our society. From classrooms to career development, the integration of AI and its impact on learning is undeniable. In this article, we will explore the intersection of AI, certification, and higher education, and delve into… Read More »Navigating the future of learning: AI, certification, and higher education
The post Navigating the future of learning: AI, certification, and higher education appeared first on Data Science Central.
( 27
min )
-->
Figure 1: CoarsenConf architecture.
(I) The encoder $q_\phi(z| X, \mathcal{R})$ takes the fine-grained (FG) ground truth conformer $X$, RDKit approximate conformer $\mathcal{R}$ , and coarse-grained (CG) conformer $\mathcal{C}$ as inputs (derived from $X$ and a predefined CG strategy), and outputs a variable-length equivariant CG representation via equivariant message passing and point convolutions.
(II) Equivariant MLPs are applied to learn the mean and log variance of both the posterior and prior distributions.
(III) The posterior (training) or prior (inference) is sampled and fed into the Channel Selection module, where an attention layer is used to learn the optimal pathway from CG to FG structure.
(IV) Given the FG latent vector and the RDKit approximation, the decoder $p_\theta…
( 5
min )
In today’s digital landscape, data privacy, and security have become the most critical concerns for businesses across industries. With the ever-evolving threat of data breaches, unauthorized access, and privacy violation, companies are increasingly seeking innovative ways to protect their digital assets and sensitive information. One such solution that helps businesses significantly safeguard their crucial information… Read More »How decentralized apps can help businesses improve data security and privacy
The post How decentralized apps can help businesses improve data security and privacy appeared first on Data Science Central.
( 20
min )
The benefits, types, and processes of data transformation and how it contributes to data management, integration, and new technologies.
The post Data transformation 101: Process and new technologies appeared first on Data Science Central.
( 22
min )
In the era of big data and AI, harnessing weather data to predict, plan, and optimize various industries has become an indispensable practice. Today, we will delve into the fascinating process of turning this voluminous weather data into actionable insights. By combining cutting-edge technology, analytical models, and industrial applications, we’ll explore how weather data can… Read More »Harnessing the power of weather data: A guide to actionable insights
The post Harnessing the power of weather data: A guide to actionable insights appeared first on Data Science Central.
( 22
min )
MAGE merges the two key tasks of image generation and recognition, typically trained separately, into a single system.
( 8
min )
MIT alumnus’ platform taps the wisdom of crowds to label medical data for AI companies.
( 10
min )
Public health organizations have a wealth of data about different types of diseases, health trends, and risk factors. Their staff has long used statistical models and regression analyses to make important decisions such as targeting populations with the highest risk factors for a disease with therapeutics, or forecasting the progression of concerning outbreaks. When public […]
( 8
min )
Generative AI technology is improving rapidly, and it’s now possible to generate text and images based on text input. Stable Diffusion is a text-to-image model that empowers you to create photorealistic applications. You can easily generate images from text using Stable Diffusion models through Amazon SageMaker JumpStart. The following are examples of input texts and […]
( 10
min )
When making financial decisions, it’s important to look at the big picture — say, one taken from a drone, satellite or AI-powered sensor. The emerging field of spatial finance harnesses AI insights from remote sensors and aerial imagery to help banks, insurers, investment firms and businesses analyze risks and opportunities, enable new services and products, Read article >
( 7
min )
A trio of top scientists is helping lead one of the most ambitious efforts in the history of computing — building a digital twin of Earth. Peter Bauer, Bjorn Stevens and Francisco “Paco” Doblas-Reyes agree that a digital twin of Earth needs to support resolutions down to a kilometer so a growing set of users Read article >
( 7
min )
It worked like magic. Computer vision algorithms running in a data center saw that a disease was about to infect a distant wheat field in India. Sixteen days later, workers in the field found the first evidence of the outbreak. It was the kind of wizardry people like Vinay Indraganti call digital transformation. He’s practiced Read article >
( 6
min )
Scientists at Matice Biosciences are using AI to study the regeneration of tissues in animals known as super-regenerators, such as salamanders and planarians. The goal of the research is to develop new treatments that will help humans heal from injuries without scarring. On the latest episode of NVIDIA’s AI Podcast, host Noah Kravtiz spoke with Read article >
( 5
min )
Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can discover and deploy publicly available and proprietary foundation models to dedicated Amazon SageMaker instances for your generative AI applications. SageMaker JumpStart allows you to deploy foundation models from a network isolated environment, and […]
( 10
min )
This blog post is co-written with Marat Adayev and Dmitrii Evstiukhin from Provectus. When machine learning (ML) models are deployed into production and employed to drive business decisions, the challenge often lies in the operation and management of multiple models. Machine Learning Operations (MLOps) provides the technical solution to this issue, assisting organizations in managing, […]
( 9
min )
Welcome to the shining world of beauty and wellness. This is where makeup artists, skincare devotees, and beauty enthusiasts come together to find the right potion to enhance their beauty. There is, however, a comical conundrum hidden amongst the sea of cosmetic products – the constant struggle to categorize them all! Let’s explore the mysteries… Read More »Cosmetic product recognition system for product categorization using AI & ML
The post Cosmetic product recognition system for product categorization using AI & ML appeared first on Data Science Central.
( 20
min )
The state-of-the-art models like GPT-4 and PaLM 2 have demonstrated the ability to perform complex tasks requiring reasoning and decision-making, pushing the boundaries of automated processes. Adding to this advancement, OpenAI’s recent API update empowers developers to define functions and parameters when prompting ‘gpt-4’ and ‘gpt-3.5’ models , making the automation of tasks more practical. … Read More »Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration
The post Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration appeared first on Data Science Central.
( 23
min )
Leading users and industry-standard benchmarks agree: NVIDIA H100 Tensor Core GPUs deliver the best AI performance, especially on the large language models (LLMs) powering generative AI. H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. That excellence is Read article >
( 6
min )
Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who accelerate 3D workflows and create virtual worlds using NVIDIA Omniverse, a development platform built on Universal Scene Description, aka OpenUSD. As augmented reality (AR) becomes more prominent and accessible across the globe, Kiryl Sidarchuk is Read article >
( 6
min )
While generative AI is a relatively new household term, drug discovery company Insilico Medicine has been using it for years to develop new therapies for debilitating diseases. The company’s early bet on deep learning is bearing fruit — a drug candidate discovered using its AI platform is now entering Phase 2 clinical trials to treat Read article >
( 6
min )
Picture a world where computing is not limited by the binary confines of zeros and ones, but instead, is free to explore the vast possibilities of continuous value data. Over the past three years a team of Microsoft researchers has been developing a new kind of analog optical computer that uses photons and electrons to […]
The post Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization appeared first on Microsoft Research.
( 14
min )
Machine learning (ML) administrators play a critical role in maintaining the security and integrity of ML workloads. Their primary focus is to ensure that users operate with the utmost security, adhering to the principle of least privilege. However, accommodating the diverse needs of different user personas and creating appropriate permission policies can sometimes impede agility. […]
( 7
min )
Will coding be a collaborative experience using github copilot? – part one Gitub recently released a survey about developer experience which claimed that “AI is here and it’s being used at scale. 92% of U.S.-based developers are already using AI coding tools both in and outside of work.” This metric (92%) has garnered some attention… Read More »Will coding be a collaborative experience using GitHub copilot? – Part one
The post Will coding be a collaborative experience using GitHub copilot? – Part one appeared first on Data Science Central.
( 20
min )
Artificial Intelligence (AI) has emerged as a revolutionary technology that is transforming various industries, and one area where it is making a significant impact is localization. Localization refers to the process of adapting products, services, and content to meet the cultural, linguistic, and functional requirements of a specific target market. With the advent of AI,… Read More »Artificial Intelligence and localization: How AI is changing the landscape
The post Artificial Intelligence and localization: How AI is changing the landscape appeared first on Data Science Central.
( 21
min )
There probably isn’t a better time than now to develop an app for your business. By the end of 2023, mobile apps are expected to generate over $935 billion. Customers are hungry for apps that can provide instant access to services. Of course, simply having an app isn’t good enough. Consumers will only use your… Read More »Application analytics: How to leverage analytics during app creation
The post Application analytics: How to leverage analytics during app creation appeared first on Data Science Central.
( 23
min )
Tiny deep learning has attracted increasing attention driven by the
substantial demand for deploying deep learning on numerous intelligent
Internet-of-Things devices. However, it is still challenging to unleash tiny
deep learning's full potential on both large-scale datasets and downstream
tasks due to the under-fitting issues caused by the limited model capacity of
tiny neural networks (TNNs). To this end, we propose a framework called
NetBooster to empower tiny deep learning by augmenting the architectures of
TNNs via an expansion-then-contraction strategy. Extensive experiments show
that NetBooster consistently outperforms state-of-the-art tiny deep learning
solutions.
( 2
min )
The precise tracking and prediction of polar ice layers can unveil historic
trends in snow accumulation. In recent years, airborne radar sensors, such as
the Snow Radar, have been shown to be able to measure these internal ice layers
over large areas with a fine vertical resolution. In our previous work, we
found that temporal graph convolutional networks perform reasonably well in
predicting future snow accumulation when given temporal graphs containing deep
ice layer thickness. In this work, we experiment with a graph attention
network-based model and used it to predict more annual snow accumulation data
points with fewer input data points on a larger dataset. We found that these
large changes only very slightly negatively impacted performance.
( 2
min )
Existing studies addressing gender bias of pre-trained language models,
usually build a small gender-neutral data set and conduct a second phase
pre-training on the model with such data. However, given the limited size and
concentrated focus of the gender-neutral data, catastrophic forgetting would
occur during second-phase pre-training. Forgetting information in the original
training data may damage the model's downstream performance by a large margin.
In this work, we empirically show that catastrophic forgetting occurs in such
methods by evaluating them with general NLP tasks in GLUE. Then, we propose a
new method, GEnder Equality Prompt (GEEP), to improve gender fairness of
pre-trained models with less forgetting. GEEP freezes the pre-trained model and
learns gender-related prompts with gender-neutral data. Empirical results show
that GEEP not only achieves SOTA performances on gender fairness tasks, but
also forgets less and performs better on GLUE by a large margin.
( 2
min )
The concepts of overfitting and generalization are vital for evaluating
machine learning models. In this work, we show that the popular Recall@K metric
depends on the number of classes in the dataset, which limits its ability to
estimate generalization. To fix this issue, we propose a new metric, which
measures retrieval performance, and, unlike Recall@K, estimates generalization.
We apply the proposed metric to popular image retrieval methods and provide new
insights about deep metric learning generalization.
( 2
min )
We present DiffInfinite, a hierarchical diffusion model that generates
arbitrarily large histological images while preserving long-range correlation
structural information. Our approach first generates synthetic segmentation
masks, subsequently used as conditions for the high-fidelity generative
diffusion process. The proposed sampling method can be scaled up to any desired
image size while only requiring small patches for fast training. Moreover, it
can be parallelized more efficiently than previous large-content generation
methods while avoiding tiling artefacts. The training leverages classifier-free
guidance to augment a small, sparsely annotated dataset with unlabelled data.
Our method alleviates unique challenges in histopathological imaging practice:
large-scale information, costly manual annotation, and protective data
handling. The biological plausibility of DiffInfinite data is validated in a
survey by ten experienced pathologists as well as a downstream segmentation
task. Furthermore, the model scores strongly on anti-copying metrics which is
beneficial for the protection of patient data.
( 2
min )
Deep learning approaches for jet tagging in high-energy physics are
characterized as black boxes that process a large amount of information from
which it is difficult to extract key distinctive observables. In this
proceeding, we present an alternative to deep learning approaches, Boost
Invariant Polynomials, which enables direct analysis of simple analytic
expressions representing the most important features in a given task. Further,
we show how this approach provides an extremely low dimensional classifier with
a minimum set of features representing %effective discriminating physically
relevant observables and how it consequently speeds up the algorithm execution,
with relatively close performance to the algorithm using the full information.
( 2
min )
We consider an aggregated human-AI collaboration aimed at generating a joint
interpretable model. The model takes the form of Boolean decision rules, where
human input is provided in the form of logical conditions or as partial
templates. This focus on the combined construction of a model offers a
different perspective on joint decision making. Previous efforts have typically
focused on aggregating outcomes rather than decisions logic. We demonstrate the
proposed approach through two examples and highlight the usefulness and
challenges of the approach.
( 2
min )
A recent alternative for hydrogen transportation as a mixture with natural
gas is blending it into natural gas pipelines. However, hydrogen embrittlement
of material is a major concern for scientists and gas installation designers to
avoid process failures. In this paper, we propose a physics-informed machine
learning model to predict the gas pressure on the pipes' inner wall. Despite
its high-fidelity results, the current PDE-based simulators are time- and
computationally-demanding. Using simulation data, we train an ML model to
predict the pressure on the pipelines' inner walls, which is a first step for
pipeline system surveillance. We found that the physics-based method
outperformed the purely data-driven method and satisfy the physical constraints
of the gas flow system.
( 2
min )
The accurate prediction and estimation of annual snow accumulation has grown
in importance as we deal with the effects of climate change and the increase of
global atmospheric temperatures. Airborne radar sensors, such as the Snow
Radar, are able to measure accumulation rate patterns at a large-scale and
monitor the effects of ongoing climate change on Greenland's precipitation and
run-off. The Snow Radar's use of an ultra-wide bandwidth enables a fine
vertical resolution that helps in capturing internal ice layers. Given the
amount of snow accumulation in previous years using the radar data, in this
paper, we propose a machine learning model based on recurrent graph
convolutional networks to predict the snow accumulation in recent consecutive
years at a certain location. We found that the model performs better and with
more consistency than equivalent nongeometric and nontemporal models.
( 2
min )
End-to-end design of communication systems using deep autoencoders (AEs) is
gaining attention due to its flexibility and excellent performance. Besides
single-user transmission, AE-based design is recently explored in multi-user
setup, e.g., for designing constellations for non-orthogonal multiple access
(NOMA). In this paper, we further advance the design of AE-based downlink NOMA
by introducing weighted loss function in the AE training. By changing the
weight coefficients, one can flexibly tune the constellation design to balance
error probability of different users, without relying on explicit information
about their channel quality. Combined with the SICNet decoder, we demonstrate a
significant improvement in achievable levels and flexible control of error
probability of different users using the proposed weighted AE-based framework.
( 2
min )
This paper focuses on studying the impact of climate data and vector larval
indices on dengue outbreak. After a comparative study of the various LSTM
models, Bidirectional Stacked LSTM network is selected to analyze the time
series climate data and health data collected for the state of Tamil Nadu
(India), for the period 2014 to 2020. Prediction accuracy of the model is
significantly improved by including the mosquito larval index, an indication of
VBD control measure.
( 2
min )
To achieve virtual certification for industrial design, quantifying the
uncertainties in simulation-driven processes is crucial. We discuss a
physics-constrained approach to account for epistemic uncertainty of turbulence
models. In order to eliminate user input, we incorporate a data-driven machine
learning strategy. In addition to it, our study focuses on developing an a
priori estimation of prediction confidence when accurate data is scarce.
( 2
min )
Recent advancements in federated learning (FL) seek to increase client-level
performance by fine-tuning client parameters on local data or personalizing
architectures for the local task. Existing methods for such personalization
either prune a global model or fine-tune a global model on a local client
distribution. However, these existing methods either personalize at the expense
of retaining important global knowledge, or predetermine network layers for
fine-tuning, resulting in suboptimal storage of global knowledge within client
models. Enlightened by the lottery ticket hypothesis, we first introduce a
hypothesis for finding optimal client subnetworks to locally fine-tune while
leaving the rest of the parameters frozen. We then propose a novel FL
framework, FedSelect, using this procedure that directly personalizes both
client subnetwork structure and parameters, via the simultaneous discovery of
optimal parameters for personalization and the rest of parameters for global
aggregation during training. We show that this method achieves promising
results on CIFAR-10.
( 2
min )
Motivated by the novel paradigm developed by Van Roy and coauthors for
reinforcement learning in arbitrary non-Markovian environments, we propose a
related formulation and explicitly pin down the error caused by
non-Markovianity of observations when the Q-learning algorithm is applied on
this formulation. Based on this observation, we propose that the criterion for
agent design should be to seek good approximations for certain conditional
laws. Inspired by classical stochastic control, we show that our problem
reduces to that of recursive computation of approximate sufficient statistics.
This leads to an autoencoder-based scheme for agent design which is then
numerically tested on partially observed reinforcement learning environments.
( 2
min )
Predicting the presence of major depressive disorder (MDD) using behavioural
and cognitive signals is a highly non-trivial task. The heterogeneous clinical
profile of MDD means that any given speech, facial expression and/or observed
cognitive pattern may be associated with a unique combination of depressive
symptoms. Conventional discriminative machine learning models potentially lack
the complexity to robustly model this heterogeneity. Bayesian networks,
however, may instead be well-suited to such a scenario. These networks are
probabilistic graphical models that efficiently describe the joint probability
distribution over a set of random variables by explicitly capturing their
conditional dependencies. This framework provides further advantages over
standard discriminative modelling by offering the possibility to incorporate
expert opinion in the graphical structure of the models, generating explainable
model predictions, informing about the uncertainty of predictions, and
naturally handling missing data. In this study, we apply a Bayesian framework
to capture the relationships between depression, depression symptoms, and
features derived from speech, facial expression and cognitive game data
collected at thymia.
( 2
min )
In this brief note, we formulate Principal Component Analysis (PCA) over
datasets consisting not of points but of distributions, characterized by their
location and covariance. Just like the usual PCA on points can be equivalently
derived via a variance-maximization principle and via a minimization of
reconstruction error, we derive a closed-form solution for distributional PCA
from both of these perspectives.
( 2
min )
We study the asymptotic generalization of an overparameterized linear model
for multiclass classification under the Gaussian covariates bi-level model
introduced in Subramanian et al.~'22, where the number of data points,
features, and classes all grow together. We fully resolve the conjecture posed
in Subramanian et al.~'22, matching the predicted regimes for generalization.
Furthermore, our new lower bounds are akin to an information-theoretic strong
converse: they establish that the misclassification rate goes to 0 or 1
asymptotically. One surprising consequence of our tight results is that the
min-norm interpolating classifier can be asymptotically suboptimal relative to
noninterpolating classifiers in the regime where the min-norm interpolating
regressor is known to be optimal.
The key to our tight analysis is a new variant of the Hanson-Wright
inequality which is broadly useful for multiclass problems with sparse labels.
As an application, we show that the same type of analysis can be used to
analyze the related multilabel classification problem under the same bi-level
ensemble.
( 2
min )
We establish generic uniform convergence guarantees for Gaussian data in
terms of the Rademacher complexity of the hypothesis class and the Lipschitz
constant of the square root of the scalar loss function. We show how these
guarantees substantially generalize previous results based on smoothness
(Lipschitz constant of the derivative), and allow us to handle the broader
class of square-root-Lipschitz losses, which includes also non-smooth loss
functions appropriate for studying phase retrieval and ReLU regression, as well
as rederive and better understand "optimistic rate" and interpolation learning
guarantees.
( 2
min )
In many numerical simulations stochastic gradient descent (SGD) type
optimization methods perform very effectively in the training of deep neural
networks (DNNs) but till this day it remains an open problem of research to
provide a mathematical convergence analysis which rigorously explains the
success of SGD type optimization methods in the training of DNNs. In this work
we study SGD type optimization methods in the training of fully-connected
feedforward DNNs with rectified linear unit (ReLU) activation. We first
establish general regularity properties for the risk functions and their
generalized gradient functions appearing in the training of such DNNs and,
thereafter, we investigate the plain vanilla SGD optimization method in the
training of such DNNs under the assumption that the target function under
consideration is a constant function. Specifically, we prove under the
assumption that the learning rates (the step sizes of the SGD optimization
method) are sufficiently small but not $L^1$-summable and under the assumption
that the target function is a constant function that the expectation of the
riskof the considered SGD process converges in the training of such DNNs to
zero as the number of SGD steps increases to infinity.
( 3
min )
We consider a new framework where a continuous, though bounded, random
variable has unobserved bounds that vary over time. In the context of
univariate time series, we look at the bounds as parameters of the distribution
of the bounded random variable. We introduce an extended log-likelihood
estimation and design algorithms to track the bound through online maximum
likelihood estimation. Since the resulting optimization problem is not convex,
we make use of recent theoretical results on Normalized Gradient Descent (NGD)
for quasiconvex optimization, to eventually derive an Online Normalized
Gradient Descent algorithm. We illustrate and discuss the workings of our
approach based on both simulation studies and a real-world wind power
forecasting problem.
( 2
min )
Most iterative neural network training methods use estimates of the loss
function over small random subsets (or minibatches) of the data to update the
parameters, which aid in decoupling the training time from the (often very
large) size of the training datasets. Here, we show that a minibatch approach
can also be used to train neural network ensembles (NNEs) via trajectory
methods in a highly efficent manner. We illustrate this approach by training
NNEs to classify images in the MNIST datasets. This method gives an improvement
to the training times, allowing it to scale as the ratio of the size of the
dataset to that of the average minibatch size which, in the case of MNIST,
gives a computational improvement typically of two orders of magnitude. We
highlight the advantage of using longer trajectories to represent NNEs, both
for improved accuracy in inference and reduced update cost in terms of the
samples needed in minibatch updates.
( 2
min )
While research in the field of transformer models has primarily focused on
enhancing performance metrics such as accuracy and perplexity, practical
applications in industry often necessitate a rigorous consideration of
inference latency constraints. Addressing this challenge, we introduce
SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes
accuracy whilst adhering to an upper-bound latency constraint. Our method
incorporates 8-bit integer quantization in the search process to outperform the
current state-of-the-art technique. Our results underline the feasibility and
efficacy of seeking an optimal balance between performance and latency,
providing new avenues for deploying state-of-the-art transformer models in
latency-sensitive environments.
( 2
min )
The recent advent of play-to-earn (P2E) systems in massively multiplayer
online role-playing games (MMORPGs) has made in-game goods interchangeable with
real-world values more than ever before. The goods in the P2E MMORPGs can be
directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn
via blockchain networks. Unlike traditional in-game goods, once they had been
written to the blockchains, P2E goods cannot be restored by the game operation
teams even with chargeback fraud such as payment fraud, cancellation, or
refund. To tackle the problem, we propose a novel chargeback fraud prediction
method, PU GNN, which leverages graph attention networks with PU loss to
capture both the players' in-game behavior with P2E token transaction patterns.
With the adoption of modified GraphSMOTE, the proposed model handles the
imbalanced distribution of labels in chargeback fraud datasets. The conducted
experiments on three real-world P2E MMORPG datasets demonstrate that PU GNN
achieves superior performances over previously suggested methods.
( 3
min )
Task and Motion Planning (TAMP) approaches are effective at planning
long-horizon autonomous robot manipulation. However, because they require a
planning model, it can be difficult to apply them to domains where the
environment and its dynamics are not fully known. We propose to overcome these
limitations by leveraging deep generative modeling, specifically diffusion
models, to learn constraints and samplers that capture these
difficult-to-engineer aspects of the planning model. These learned samplers are
composed and combined within a TAMP solver in order to find action parameter
values jointly that satisfy the constraints along a plan. To tractably make
predictions for unseen objects in the environment, we define these samplers on
low-dimensional learned latent embeddings of changing object state. We evaluate
our approach in an articulated object manipulation domain and show how the
combination of classical TAMP, generative learning, and latent embeddings
enables long-horizon constraint-based reasoning.
( 2
min )
We study the cost of overfitting in noisy kernel ridge regression (KRR),
which we define as the ratio between the test error of the interpolating
ridgeless model and the test error of the optimally-tuned model. We take an
"agnostic" view in the following sense: we consider the cost as a function of
sample size for any target function, even if the sample size is not large
enough for consistency or the target is outside the RKHS. We analyze the cost
of overfitting under a Gaussian universality ansatz using recently derived
(non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis
provides a more refined characterization of benign, tempered and catastrophic
overfitting (qv Mallinar et al. 2022).
( 2
min )
Time series motifs are used for discovering higher-order structures of time
series data. Based on time series motifs, the motif embedding correlation field
(MECF) is proposed to characterize higher-order temporal structures of
dynamical system time series. A MECF-based unsupervised learning approach is
applied in locating the source of the forced oscillation (FO), a periodic
disturbance that detrimentally impacts power grids. Locating the FO source is
imperative for system stability. Compared with the Fourier analysis, the
MECF-based unsupervised learning is applicable under various FO situations,
including the single FO, FO with resonance, and multiple sources FOs. The
MECF-based unsupervised learning is a data-driven approach without any prior
knowledge requirement of system models or typologies. Tests on the UK
high-voltage transmission grid illustrate the effectiveness of MECF-based
unsupervised learning. In addition, the impacts of coupling strength and
measurement noise on locating the FO source by the MECF-based unsupervised
learning are investigated.
( 2
min )
In federated learning, data heterogeneity is a critical challenge. A
straightforward solution is to shuffle the clients' data to homogenize the
distribution. However, this may violate data access rights, and how and when
shuffling can accelerate the convergence of a federated optimization algorithm
is not theoretically well understood. In this paper, we establish a precise and
quantifiable correspondence between data heterogeneity and parameters in the
convergence rate when a fraction of data is shuffled across clients. We prove
that shuffling can quadratically reduce the gradient dissimilarity with respect
to the shuffling percentage, accelerating convergence. Inspired by the theory,
we propose a practical approach that addresses the data access rights issue by
shuffling locally generated synthetic data. The experimental results show that
shuffling synthetic data improves the performance of multiple existing
federated learning algorithms by a large margin.
( 2
min )
Training normalizing flow generative models can be challenging due to the
need to calculate computationally expensive determinants of Jacobians. This
paper studies the likelihood-free training of flows and proposes the energy
objective, an alternative sample-based loss based on proper scoring rules. The
energy objective is determinant-free and supports flexible model architectures
that are not easily compatible with maximum likelihood training, including
semi-autoregressive energy flows, a novel model family that interpolates
between fully autoregressive and non-autoregressive models. Energy flows
feature competitive sample quality, posterior inference, and generation speed
relative to likelihood-based flows; this performance is decorrelated from the
quality of log-likelihood estimates, which are generally very poor. Our
findings question the use of maximum likelihood as an objective or a metric,
and contribute to a scientific study of its role in generative modeling.
( 2
min )
We study the asymptotic generalization of an overparameterized linear model
for multiclass classification under the Gaussian covariates bi-level model
introduced in Subramanian et al.~'22, where the number of data points,
features, and classes all grow together. We fully resolve the conjecture posed
in Subramanian et al.~'22, matching the predicted regimes for generalization.
Furthermore, our new lower bounds are akin to an information-theoretic strong
converse: they establish that the misclassification rate goes to 0 or 1
asymptotically. One surprising consequence of our tight results is that the
min-norm interpolating classifier can be asymptotically suboptimal relative to
noninterpolating classifiers in the regime where the min-norm interpolating
regressor is known to be optimal.
The key to our tight analysis is a new variant of the Hanson-Wright
inequality which is broadly useful for multiclass problems with sparse labels.
As an application, we show that the same type of analysis can be used to
analyze the related multilabel classification problem under the same bi-level
ensemble.
( 2
min )
Graph generative model evaluation necessitates understanding differences
between graphs on the distributional level. This entails being able to harness
salient attributes of graphs in an efficient manner. Curvature constitutes one
such property of graphs, and has recently started to prove useful in
characterising graphs. Its expressive properties, stability, and practical
utility in model evaluation remain largely unexplored, however. We combine
graph curvature descriptors with emerging methods from topological data
analysis to obtain robust, expressive descriptors for evaluating graph
generative models.
( 2
min )
Gaussianization is a simple generative model that can be trained without
backpropagation. It has shown compelling performance on low dimensional data.
As the dimension increases, however, it has been observed that the convergence
speed slows down. We show analytically that the number of required layers
scales linearly with the dimension for Gaussian input. We argue that this is
because the model is unable to capture dependencies between dimensions.
Empirically, we find the same linear increase in cost for arbitrary input
$p(x)$, but observe favorable scaling for some distributions. We explore
potential speed-ups and formulate challenges for further research.
( 2
min )
Training normalizing flow generative models can be challenging due to the
need to calculate computationally expensive determinants of Jacobians. This
paper studies the likelihood-free training of flows and proposes the energy
objective, an alternative sample-based loss based on proper scoring rules. The
energy objective is determinant-free and supports flexible model architectures
that are not easily compatible with maximum likelihood training, including
semi-autoregressive energy flows, a novel model family that interpolates
between fully autoregressive and non-autoregressive models. Energy flows
feature competitive sample quality, posterior inference, and generation speed
relative to likelihood-based flows; this performance is decorrelated from the
quality of log-likelihood estimates, which are generally very poor. Our
findings question the use of maximum likelihood as an objective or a metric,
and contribute to a scientific study of its role in generative modeling.
( 2
min )
Transformer architectures are complex and their use in NLP, while it has
engendered many successes, makes their interpretability or explainability
challenging. Recent debates have shown that attention maps and attribution
methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this
paper, we present some of their limitations and introduce COCKATIEL, which
successfully addresses some of them. COCKATIEL is a novel, post-hoc,
concept-based, model-agnostic XAI technique that generates meaningful
explanations from the last layer of a neural net model trained on an NLP
classification task by using Non-Negative Matrix Factorization (NMF) to
discover the concepts the model leverages to make predictions and by exploiting
a Sensitivity Analysis to estimate accurately the importance of each of these
concepts for the model. It does so without compromising the accuracy of the
underlying model or requiring a new one to be trained. We conduct experiments
in single and multi-aspect sentiment analysis tasks and we show COCKATIEL's
superior ability to discover concepts that align with humans' on Transformer
models without any supervision, we objectively verify the faithfulness of its
explanations through fidelity metrics, and we showcase its ability to provide
meaningful explanations in two different datasets.
( 3
min )
Predictive pattern mining is an approach used to construct prediction models
when the input is represented by structured data, such as sets, graphs, and
sequences. The main idea behind predictive pattern mining is to build a
prediction model by considering substructures, such as subsets, subgraphs, and
subsequences (referred to as patterns), present in the structured data as
features of the model. The primary challenge in predictive pattern mining lies
in the exponential growth of the number of patterns with the complexity of the
structured data. In this study, we propose the Safe Pattern Pruning (SPP)
method to address the explosion of pattern numbers in predictive pattern
mining. We also discuss how it can be effectively employed throughout the
entire model building process in practical data analysis. To demonstrate the
effectiveness of the proposed method, we conduct numerical experiments on
regression and classification problems involving sets, graphs, and sequences.
( 2
min )
We consider a new framework where a continuous, though bounded, random
variable has unobserved bounds that vary over time. In the context of
univariate time series, we look at the bounds as parameters of the distribution
of the bounded random variable. We introduce an extended log-likelihood
estimation and design algorithms to track the bound through online maximum
likelihood estimation. Since the resulting optimization problem is not convex,
we make use of recent theoretical results on Normalized Gradient Descent (NGD)
for quasiconvex optimization, to eventually derive an Online Normalized
Gradient Descent algorithm. We illustrate and discuss the workings of our
approach based on both simulation studies and a real-world wind power
forecasting problem.
( 2
min )
We study the cost of overfitting in noisy kernel ridge regression (KRR),
which we define as the ratio between the test error of the interpolating
ridgeless model and the test error of the optimally-tuned model. We take an
"agnostic" view in the following sense: we consider the cost as a function of
sample size for any target function, even if the sample size is not large
enough for consistency or the target is outside the RKHS. We analyze the cost
of overfitting under a Gaussian universality ansatz using recently derived
(non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis
provides a more refined characterization of benign, tempered and catastrophic
overfitting (qv Mallinar et al. 2022).
( 2
min )
Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code. SageMaker Data Wrangler supports Snowflake, a popular […]
( 12
min )
For data scientists, moving machine learning (ML) models from proof of concept to production often presents a significant challenge. One of the main challenges can be deploying a well-performing, locally trained model to the cloud for inference and use in other applications. It can be cumbersome to manage the process, but with the right tool, […]
( 10
min )
Researchers at Yamagata University in Japan have harnessed AI to uncover four previously unseen geoglyphs — images on the ground, some as wide as 1,200 feet, made using the land’s elements — in Nazca, a seven-hour drive south of Lima, Peru. The geoglyphs — a humanoid, a pair of legs, a fish and a bird Read article >
( 4
min )
We compute how small input perturbations affect the output of deep neural
networks, exploring an analogy between deep networks and dynamical systems,
where the growth or decay of local perturbations is characterised by
finite-time Lyapunov exponents. We show that the maximal exponent forms
geometrical structures in input space, akin to coherent structures in dynamical
systems. Ridges of large positive exponents divide input space into different
regions that the network associates with different classes. These ridges
visualise the geometry that deep networks construct in input space, shedding
light on the fundamental mechanisms underlying their learning capabilities.
( 2
min )
Spaces with locally varying scale of measurement, like multidimensional
structures with differently scaled dimensions, are pretty common in statistics
and machine learning. Nevertheless, it is still understood as an open question
how to exploit the entire information encoded in them properly. We address this
problem by considering an order based on (sets of) expectations of random
variables mapping into such non-standard spaces. This order contains stochastic
dominance and expectation order as extreme cases when no, or respectively
perfect, cardinal structure is given. We derive a (regularized) statistical
test for our proposed generalized stochastic dominance (GSD) order,
operationalize it by linear optimization, and robustify it by imprecise
probability models. Our findings are illustrated with data from
multidimensional poverty measurement, finance, and medicine.
( 2
min )
We focus on decentralized stochastic non-convex optimization, where $n$
agents work together to optimize a composite objective function which is a sum
of a smooth term and a non-smooth convex term. To solve this problem, we
propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These
algorithms can find $\epsilon$-stationary points in
$\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e.,
$\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable
complexity without requiring large batch sizes, more complex per-iteration
operations (such as double loops), or stronger assumptions. Our theoretical
findings are supported by extensive numerical experiments, which demonstrate
the superiority of our algorithms over previous approaches. Our code is
available at https://github.com/xuxingc/ProxDASA.
( 2
min )
In recent years, studies such as
\cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have
demonstrated that incorporating additional real or generated data with
pseudo-labels can enhance adversarial training through a two-stage training
approach. In this paper, we perform a theoretical analysis of the asymptotic
behavior of this method in high-dimensional linear regression. While a
double-descent phenomenon can be observed in ridgeless training, with an
appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training
achieves a better performance. Finally, we derive a shortcut cross-validation
formula specifically tailored for the two-stage training method.
( 2
min )
We show that any randomized first-order algorithm which minimizes a
$d$-dimensional, $1$-Lipschitz convex function over the unit ball must either
use $\Omega(d^{2-\delta})$ bits of memory or make $\Omega(d^{1+\delta/6-o(1)})$
queries, for any constant $\delta\in (0,1)$ and when the precision $\epsilon$
is quasipolynomially small in $d$. Our result implies that cutting plane
methods, which use $\tilde{O}(d^2)$ bits of memory and $\tilde{O}(d)$ queries,
are Pareto-optimal among randomized first-order algorithms, and quadratic
memory is required to achieve optimal query complexity for convex optimization.
( 2
min )
Markov chain Monte Carlo (MCMC) algorithms have played a significant role in
statistics, physics, machine learning and others, and they are the only known
general and efficient approach for some high-dimensional problems. The random
walk Metropolis (RWM) algorithm as the most classical MCMC algorithm, has had a
great influence on the development and practice of science and engineering. The
behavior of the RWM algorithm in high-dimensional problems is typically
investigated through a weak convergence result of diffusion processes. In this
paper, we utilize the Mosco convergence of Dirichlet forms in analyzing the RWM
algorithm on large graphs, whose target distribution is the Gibbs measure that
includes any probability measure satisfying a Markov property. The abstract and
powerful theory of Dirichlet forms allows us to work directly and naturally on
the infinite-dimensional space, and our notion of Mosco convergence allows
Dirichlet forms associated with the RWM chains to lie on changing Hilbert
spaces. Through the optimal scaling problem, we demonstrate the impressive
strengths of the Dirichlet form approach over the standard diffusion approach.
( 2
min )
The library scikit-fda is a Python package for Functional Data Analysis
(FDA). It provides a comprehensive set of tools for representation,
preprocessing, and exploratory analysis of functional data. The library is
built upon and integrated in Python's scientific ecosystem. In particular, it
conforms to the scikit-learn application programming interface so as to take
advantage of the functionality for machine learning provided by this package:
pipelines, model selection, and hyperparameter tuning, among others. The
scikit-fda package has been released as free and open-source software under a
3-Clause BSD license and is open to contributions from the FDA community. The
library's extensive documentation includes step-by-step tutorials and detailed
examples of use.
( 2
min )
Six teams conducting research in AI, data science, and machine learning receive funding for projects that have potential commercial applications.
( 9
min )
Large AI models are transforming the digital world. Generative language models like Turing-NLG, ChatGPT, and GPT-4, powered by large language models (LLMs), are incredibly versatile, capable of performing tasks like summarization, coding, and translation. Similarly, large multimodal generative models like DALL·E, Microsoft Designer, and Bing Image Creator can generate art, architecture, videos, and other digital […]
The post DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication appeared first on Microsoft Research.
( 16
min )
Researcher Bichlien Nguyen is an organic electrochemist turned technologist. Professor David Kwabi is a mechanical engineer. Their work uses ML to help discover organic compounds for renewable energy storage. Learn about their collaboration.
The post Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi appeared first on Microsoft Research.
( 33
min )
This post is co-written with Aruna Abeyakoon and Denisse Colin from Light and Wonder (L&W). Headquartered in Las Vegas, Light & Wonder, Inc. is the leading cross-platform global game company that provides gambling products and services. Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to […]
( 12
min )
Detecting delirium isn’t easy, but it can have a big payoff: speeding essential care to patients, leading to quicker and surer recovery. Improved detection also reduces the need for long-term skilled care, enhancing the quality of life for patients while decreasing a major financial burden. In the U.S., caring for those suffering from delirium costs Read article >
( 5
min )
Conquer the lands in Microsoft’s award-winning Age of Empires III: Definitive Edition. It leads 10 new games supported today on GeForce NOW. At Your Command Age of Empires III: Definitive Edition is a remaster of one of the most beloved real-time strategy franchises featuring improved visuals, enhanced gameplay, cross-platform multiplayer and more. Command mighty civilizations Read article >
( 4
min )
Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting […]
( 7
min )
As a contact center agent, would you rather focus on having productive customer conversations or get distracted by having to look up customer information and knowledge articles that could exist in various systems? We’ve all been there. Having a productive conversation while multitasking is challenging. A single negative experience may put a dent on a […]
( 7
min )
Amir Anbarestani, an accomplished 3D artist who goes by the moniker Kingsletter, had a “shell of a good time” creating his Space Turtle scene this week In the NVIDIA Studio.
( 7
min )
Whether animating fish fins or fashioning chic outfits for digital characters, creators can tap Marvelous Designer software to compose and tailor assets, clothes and other materials for their 3D workflows.
( 5
min )
Machine learning (ML) engineers have traditionally focused on striking a balance between model training and deployment cost vs. performance. Increasingly, sustainability (energy efficiency) is becoming an additional objective for customers. This is important because training ML models and then using the trained models to make predictions (inference) can be highly energy-intensive tasks. In addition, more […]
( 8
min )
Generative AI will “supercharge” creators across industries and content types, NVIDIA founder and CEO Jensen Huang said today at the Cannes Lions Festival, on the French Riviera. “For the very first time, the creative process can be amplified in content generation, and the content generation could be in any modality — it could be text, Read article >
( 7
min )
Announcements Is optimized structural efficiency ‘human’ design? I recently read a paper titled “On the use of Artificial Neural Networks in Topology Optimisation,” about the process of topological optimization. In short, topological optimization is the process of determining the most efficient distribution of structural material for a given design. Typically there are simulation models involved… Read More »DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design?
The post DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design? appeared first on Data Science Central.
( 20
min )
In the vast realm of artificial intelligence, few fields have captivated our imagination and pushed the boundaries of possibility quite like computer vision. At the core of this domain of research and innovation lies the ambition to empower technologies for real-world vision-based systems, enabling machines to take in and respond to visual stimuli with unparalleled […]
The post Microsoft at CVPR 2023: Pushing the boundaries of computer vision appeared first on Microsoft Research.
( 16
min )
Amazon SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. For provisioning Studio in your AWS account and Region, you first need to create an Amazon SageMaker domain—a construct that encapsulates your ML environment. More concretely, a SageMaker domain […]
( 9
min )
Neural architecture search (NAS) for Graph neural networks (GNNs), called
NAS-GNNs, has achieved significant performance over manually designed GNN
architectures. However, these methods inherit issues from the conventional NAS
methods, such as high computational cost and optimization difficulty. More
importantly, previous NAS methods have ignored the uniqueness of GNNs, where
GNNs possess expressive power without training. With the randomly-initialized
weights, we can then seek the optimal architecture parameters via the sparse
coding objective and derive a novel NAS-GNNs method, namely neural architecture
coding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and can
efficiently compute in linear time. Empirical evaluations on multiple GNN
benchmark datasets demonstrate that our approach leads to state-of-the-art
performance, which is up to $200\times$ faster and $18.8\%$ more accurate than
the strong baselines.
( 2
min )
We consider (stochastic) subgradient methods for strongly convex but
potentially nonsmooth non-Lipschitz optimization. We provide new equivalent
dual descriptions (in the style of dual averaging) for the classic subgradient
method, the proximal subgradient method, and the switching subgradient method.
These equivalences enable $O(1/T)$ convergence guarantees in terms of both
their classic primal gap and a not previously analyzed dual gap for strongly
convex optimization. Consequently, our theory provides these classic methods
with simple, optimal stopping criteria and optimality certificates at no added
computational cost. Our results apply under nearly any stepsize selection and
for a range of non-Lipschitz ill-conditioned problems where the early
iterations of the subgradient method may diverge exponentially quickly (a
phenomenon which, to the best of our knowledge, no prior works address). Even
in the presence of such undesirable behaviors, our theory still ensures and
bounds eventual convergence.
( 2
min )
Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing
neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being
popular choices for cycling through random or single permutations of the
training data. However, the convergence properties of these algorithms in the
non-convex case are not fully understood. Existing results suggest that, in
realistic training scenarios where the number of epochs is smaller than the
training set size, RR may perform worse than SGD.
In this paper, we analyze a general SGD algorithm that allows for arbitrary
data orderings and show improved convergence rates for non-convex functions.
Specifically, our analysis reveals that SGD with random and single shuffling is
always faster or at least as good as classical SGD with replacement, regardless
of the number of iterations. Overall, our study highlights the benefits of
using SGD with random/single shuffling and provides new insights into its
convergence properties for non-convex optimization.
( 2
min )
While several recent works have identified societal-scale and
extinction-level risks to humanity arising from artificial intelligence, few
have attempted an {\em exhaustive taxonomy} of such risks. Many exhaustive
taxonomies are possible, and some are useful -- particularly if they reveal new
risks or practical approaches to safety. This paper explores a taxonomy based
on accountability: whose actions lead to the risk, are the actors unified, and
are they deliberate? We also provide stories to illustrate how the various risk
types could each play out, including risks arising from unanticipated
interactions of many AI systems, as well as risks from deliberate misuse, for
which combined technical and policy solutions are indicated.
( 2
min )
Spectral-temporal graph neural network is a promising abstraction underlying
most time series forecasting models that are based on graph neural networks
(GNNs). However, more is needed to know about the underpinnings of this branch
of methods. In this paper, we establish a theoretical framework that unravels
the expressive power of spectral-temporal GNNs. Our results show that linear
spectral-temporal GNNs are universal under mild assumptions, and their
expressive power is bounded by our extended first-order Weisfeiler-Leman
algorithm on discrete-time dynamic graphs. To make our findings useful in
practice on valid instantiations, we discuss related constraints in detail and
outline a theoretical blueprint for designing spatial and temporal modules in
spectral domains. Building on these insights and to demonstrate how powerful
spectral-temporal GNNs are based on our framework, we propose a simple
instantiation named Temporal Graph GegenConv (TGC), which significantly
outperforms most existing models with only linear components and shows better
model efficiency.
( 2
min )
This paper presents a local energy distribution based hyperparameter
determination for stochastic simulated annealing (SSA). SSA is capable of
solving combinatorial optimization problems faster than typical simulated
annealing (SA), but requires a time-consuming hyperparameter search. The
proposed method determines hyperparameters based on the local energy
distributions of spins (probabilistic bits). The spin is a basic computing
element of SSA and is graphically connected to other spins with its weights.
The distribution of the local energy can be estimated based on the central
limit theorem (CLT). The CLT-based normal distribution is used to determine the
hyperparameters, which reduces the time complexity for hyperparameter search
from O(n^3) of the conventional method to O(1). The performance of SSA with the
determined hyperparameters is evaluated on the Gset and K2000 benchmarks for
maximum-cut problems. The results show that the proposed method achieves mean
cut values of approximately 98% of the best-known cut values.
( 2
min )
Privacy-utility tradeoff remains as one of the fundamental issues of
differentially private machine learning. This paper introduces a geometrically
inspired kernel-based approach to mitigate the accuracy-loss issue in
classification. In this approach, a representation of the affine hull of given
data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads
to a novel distance measure that hides privacy-sensitive information about
individual data points and improves the privacy-utility tradeoff via
significantly reducing the risk of membership inference attacks. The
effectiveness of the approach is demonstrated through experiments on MNIST
dataset, Freiburg groceries dataset, and a real biomedical dataset. It is
verified that the approach remains computationally practical. The application
of the approach to federated learning is considered and it is observed that the
accuracy-loss due to data being distributed is either marginal or not
significantly high.
( 2
min )
The intersection of machine learning and dynamical systems has generated
considerable interest recently. Neural Ordinary Differential Equations (NODEs)
represent a rich overlap between these fields. In this paper, we develop a
continuous time neural network approach based on Delay Differential Equations
(DDEs). Our model uses the adjoint sensitivity method to learn the model
parameters and delay directly from data. Our approach is inspired by that of
NODEs and extends earlier neural DDE models, which have assumed that the value
of the delay is known a priori. We perform a sensitivity analysis on our
proposed approach and demonstrate its ability to learn DDE parameters from
benchmark systems. We conclude our discussion with potential future directions
and applications.
( 2
min )
Out-of-distribution (OOD) generalization deals with the prevalent learning
scenario where test distribution shifts from training distribution. With rising
application demands and inherent complexity, graph OOD problems call for
specialized solutions. While data-centric methods exhibit performance
enhancements on many generic machine learning tasks, there is a notable absence
of data augmentation methods tailored for graph OOD generalization. In this
work, we propose to achieve graph OOD generalization with the novel design of
non-Euclidean-space linear extrapolation. The proposed augmentation strategy
extrapolates both structure and feature spaces to generate OOD graph data. Our
design tailors OOD samples for specific shifts without corrupting underlying
causal mechanisms. Theoretical analysis and empirical results evidence the
effectiveness of our method in solving target shifts, showing substantial and
constant improvements across various graph OOD tasks.
( 2
min )
Due to the popularity of Graph Neural Networks (GNNs), various GNN-based
methods have been designed to reason on knowledge graphs (KGs). An important
design component of GNN-based KG reasoning methods is called the propagation
path, which contains a set of involved entities in each propagation step.
Existing methods use hand-designed propagation paths, ignoring the correlation
between the entities and the query relation. In addition, the number of
involved entities will explosively grow at larger propagation steps. In this
work, we are motivated to learn an adaptive propagation path in order to filter
out irrelevant entities while preserving promising targets. First, we design an
incremental sampling mechanism where the nearby targets and layer-wise
connections can be preserved with linear complexity. Second, we design a
learning-based sampling distribution to identify the semantically related
entities. Extensive experiments show that our method is powerful, efficient,
and semantic-aware. The code is available at
https://github.com/LARS-research/AdaProp.
( 2
min )
Score-based generative models (SGMs) learn a family of noise-conditional
score functions corresponding to the data density perturbed with increasingly
large amounts of noise. These perturbed data densities are linked together by
the Fokker-Planck equation (FPE), a partial differential equation (PDE)
governing the spatial-temporal evolution of a density undergoing a diffusion
process. In this work, we derive a corresponding equation called the score FPE
that characterizes the noise-conditional scores of the perturbed data densities
(i.e., their gradients). Surprisingly, despite the impressive empirical
performance, we observe that scores learned through denoising score matching
(DSM) fail to fulfill the underlying score FPE, which is an inherent
self-consistency property of the ground truth score. We prove that satisfying
the score FPE is desirable as it improves the likelihood and the degree of
conservativity. Hence, we propose to regularize the DSM objective to enforce
satisfaction of the score FPE, and we show the effectiveness of this approach
across various datasets.
( 2
min )
CORL is an open-source library that provides thoroughly benchmarked
single-file implementations of both deep offline and offline-to-online
reinforcement learning algorithms. It emphasizes a simple developing experience
with a straightforward codebase and a modern analysis tracking tool. In CORL,
we isolate methods implementation into separate single files, making
performance-relevant details easier to recognize. Additionally, an experiment
tracking feature is available to help log metrics, hyperparameters,
dependencies, and more to the cloud. Finally, we have ensured the reliability
of the implementations by benchmarking commonly employed D4RL datasets
providing a transparent source of results that can be reused for robust
evaluation tools such as performance profiles, probability of improvement, or
expected online performance.
( 2
min )
Novel test selectors used in simulation-based verification have been shown to
significantly accelerate coverage closure regardless of the number of coverage
holes. This paper presents a configurable and highly-automated framework for
novel test selection based on neural networks. Three configurations of this
framework are tested with a commercial signal processing unit. All three
convincingly outperform random test selection with the largest saving of
simulation being 49.37% to reach 99.5% coverage. The computational expense of
the configurations is negligible compared to the simulation reduction. We
compare the experimental results and discuss important characteristics related
to the performance of the configurations.
( 2
min )
The quasiparticle effective mass $m^\ast$ of interacting electrons is a
fundamental quantity in the Fermi liquid theory. However, the precise value of
the effective mass of uniform electron gas is still elusive after decades of
research. The newly developed neural canonical transformation approach [Xie et
al., J. Mach. Learn. 1, (2022)] offers a principled way to extract the
effective mass of electron gas by directly calculating the thermal entropy at
low temperature. The approach models a variational many-electron density matrix
using two generative neural networks: an autoregressive model for momentum
occupation and a normalizing flow for electron coordinates. Our calculation
reveals a suppression of effective mass in the two-dimensional spin-polarized
electron gas, which is more pronounced than previous reports in the low-density
strong-coupling region. This prediction calls for verification in
two-dimensional electron gas experiments.
( 2
min )
The transition to a fully renewable energy grid requires better forecasting
of demand at the low-voltage level to increase efficiency and ensure reliable
control. However, high fluctuations and increasing electrification cause huge
forecast variability, not reflected in traditional point estimates.
Probabilistic load forecasts take future uncertainties into account and thus
allow more informed decision-making for the planning and operation of
low-carbon energy systems. We propose an approach for flexible conditional
density forecasting of short-term load based on Bernstein polynomial
normalizing flows, where a neural network controls the parameters of the flow.
In an empirical study with 363 smart meter customers, our density predictions
compare favorably against Gaussian and Gaussian mixture densities. Also, they
outperform a non-parametric approach based on the pinball loss for 24h-ahead
load forecasting for two different neural network architectures.
( 2
min )
We study the loss landscape of training problems for deep artificial neural
networks with a one-dimensional real output whose activation functions contain
an affine segment and whose hidden layers have width at least two. It is shown
that such problems possess a continuum of spurious (i.e., not globally optimal)
local minima for all target functions that are not affine. In contrast to
previous works, our analysis covers all sampling and parameterization regimes,
general differentiable loss functions, arbitrary continuous nonpolynomial
activation functions, and both the finite- and infinite-dimensional setting. It
is further shown that the appearance of the spurious local minima in the
considered training problems is a direct consequence of the universal
approximation theorem and that the underlying mechanisms also cause, e.g.,
$L^p$-best approximation problems to be ill-posed in the sense of Hadamard for
all networks that do not have a dense image. The latter result also holds
without the assumption of local affine linearity and without any conditions on
the hidden layers.
( 2
min )
In the pursuit of artificial general intelligence (AGI), we tackle
Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged
approach. We employ the Decision Transformer in an imitation learning paradigm
to model human problem-solving, and introduce an object detection algorithm,
the Push and Pull clustering method. This dual strategy enhances AI's ARC
problem-solving skills and provides insights for AGI progression. Yet, our work
reveals the need for advanced data collection tools, robust training datasets,
and refined model structures. This study highlights potential improvements for
Decision Transformers and propels future AGI research.
( 2
min )
We consider the problem of recovering a latent graph where the observations
at each node are \emph{aliased}, and transitions are stochastic. Observations
are gathered by an agent traversing the graph. Aliasing means that multiple
nodes emit the same observation, so the agent can not know in which node it is
located. The agent needs to uncover the hidden topology as accurately as
possible and in as few steps as possible. This is equivalent to efficient
recovery of the transition probabilities of a partially observable Markov
decision process (POMDP) in which the observation probabilities are known. An
algorithm for efficiently exploring (and ultimately recovering) the latent
graph is provided. Our approach is exponentially faster than naive exploration
in a variety of challenging topologies with aliased observations while
remaining competitive with existing baselines in the unaliased regime.
( 2
min )
We develop information-geometric techniques to analyze the trajectories of
the predictions of deep networks during training. By examining the underlying
high-dimensional probabilistic models, we reveal that the training process
explores an effectively low-dimensional manifold. Networks with a wide range of
architectures, sizes, trained using different optimization methods,
regularization techniques, data augmentation techniques, and weight
initializations lie on the same manifold in the prediction space. We study the
details of this manifold to find that networks with different architectures
follow distinguishable trajectories but other factors have a minimal influence;
larger networks train along a similar manifold as that of smaller networks,
just faster; and networks initialized at very different parts of the prediction
space converge to the solution along a similar manifold.
( 2
min )
We propose a new method for optimistic planning in infinite-horizon
discounted Markov decision processes based on the idea of adding regularization
to the updates of an otherwise standard approximate value iteration procedure.
This technique allows us to avoid contraction and monotonicity arguments
typically required by existing analyses of approximate dynamic programming
methods, and in particular to use approximate transition functions estimated
via least-squares procedures in MDPs with linear function approximation. We use
our method to recover known guarantees in tabular MDPs and to provide a
computationally efficient algorithm for learning near-optimal policies in
discounted linear mixture MDPs from a single stream of experience, and show it
achieves near-optimal statistical guarantees.
( 2
min )
It is shown that over-parameterized neural networks can achieve minimax
optimal rates of convergence (up to logarithmic factors) for learning functions
from certain smooth function classes, if the weights are suitably constrained
or regularized. Specifically, we consider the nonparametric regression of
estimating an unknown $d$-variate function by using shallow ReLU neural
networks. It is assumed that the regression function is from the H\"older space
with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow
neural networks, which can be viewed as an infinitely wide neural network. In
this setting, we prove that least squares estimators based on shallow neural
networks with certain norm constraints on the weights are minimax optimal, if
the network width is sufficiently large. As a byproduct, we derive a new
size-independent bound for the local Rademacher complexity of shallow ReLU
neural networks, which may be of independent interest.
( 2
min )
AI Weirdness: the strange side of machine learning
( 2
min )
This content was previously published by Nature Portfolio and Springer Nature Communities on Nature Portfolio Earth and Environment Community. Improving our ability to forecast the weather and climate is of interest to all sectors of the economy and to government agencies from the local to the national level. Weather forecasts zero to ten days ahead and […]
The post Improving Subseasonal Forecasting with Machine Learning appeared first on Microsoft Research.
( 11
min )
Healthcare has undergone consistent changes. From surgeries that were conducted by sedating patients with opium and amputation of an infected part, the technology has evolved to anesthesia and new & innovative ways of treating bacterial infection. We can now conduct LASIK surgeries and have even healed conjoined twins from the head. With so much advancement,… Read More »Enhancing patient care with AI-powered health monitoring and remote patient management
The post Enhancing patient care with AI-powered health monitoring and remote patient management appeared first on Data Science Central.
( 22
min )
At SambaSafety, their mission is to promote safer communities by reducing risk through data insights. Since 1998, SambaSafety has been the leading North American provider of cloud–based mobility risk management software for organizations with commercial and non–commercial drivers. SambaSafety serves more than 15,000 global employers and insurance carriers with driver risk and compliance monitoring, online […]
( 6
min )
Posted by Juhyun Lee and Raman Sarokin, Software Engineers, Core Systems & Experiences
The proliferation of large diffusion models for image generation has led to a significant increase in model size and inference workloads. On-device ML inference in mobile environments requires meticulous performance optimization and consideration of trade-offs due to resource constraints. Running inference of large diffusion models (LDMs) on-device, driven by the need for cost efficiency and user privacy, presents even greater challenges due to the substantial memory requirements and computational demands of these models.
We address this challenge in our work titled “Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations” (to be presented at the CVPR 20…
( 92
min )
NVIDIA will be showcased next week as the winner of the fiercely contested 3D Occupancy Prediction Challenge for autonomous driving development at the Computer Vision and Pattern Recognition Conference (CVPR), in Vancouver, Canada. The competition had more than 400 submissions from nearly 150 teams across 10 regions. 3D occupancy prediction is the process of forecasting Read article >
( 6
min )
Xbox Game Pass support is coming to GeForce NOW. Members will soon be able to play supported PC games from the Xbox Game Pass catalog through NVIDIA’s cloud gaming servers. Learn more about how support for Game Pass and Microsoft Store will roll out in the coming months. Plus, Age of Empires IV: Anniversary Edition Read article >
( 5
min )
MIT-Novo Nordisk Artificial Intelligence Postdoctoral Fellows Program will support up to 10 postdocs annually over five years.
( 7
min )
MIT postdoc Ziv Epstein SM ’19, PhD ’23 discusses issues arising from the use of generative AI to make art and other media.
( 9
min )
Dive into Deep Learning (D2L.ai) is an open-source textbook that makes deep learning accessible to everyone. It features interactive Jupyter notebooks with self-contained code in PyTorch, JAX, TensorFlow, and MXNet, as well as real-world examples, exposition figures, and math. So far, D2L has been adopted by more than 400 universities around the world, such as […]
( 9
min )
My son and I both had stints in 2022 at low-volume, high-margin, long-established manufacturers. My son was doing assembly for a power management system maker with renewable energy/smart grid customers. I was doing compliance document management and related analytics for a bioengineering equipment maker. Both companies had some of the same nagging challenges. My son… Read More »Data-centric development: A hypothetical tech manufacturing example
The post Data-centric development: A hypothetical tech manufacturing example appeared first on Data Science Central.
( 21
min )
Chat GPT has been a massive revelation of all time. Generations have evolved and experienced a system that is technically advanced and leverages the highest possible benefits for diversified sectors. As per reports from Investingnews.com, OpenAI Chat GPT has a market value of USD 29 billion with the company’s recent 2023 forecast reflecting a 150%… Read More »Prompt engineering and few-shot learning: An experience beyond data science
The post Prompt engineering and few-shot learning: An experience beyond data science appeared first on Data Science Central.
( 21
min )
Every organization has its own set of standards and practices that provide security and governance for their AWS environment. Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. SageMaker provides a set of templates […]
( 14
min )
When California skies turned orange in the wake of devastating wildfires, a startup fused computer vision and generative AI to fight back. “With the 2020 wildfires, it became very personal, so we asked fire officials how we could help,” said Emrah Gultekin, the Turkish-born CEO of Chooch, a Silicon Valley-based leader in computer vision. California Read article >
( 6
min )
With over 900,000 subscribers on her YouTube channel, editor and filmmaker Sara Dietschy creates docuseries, reviews and vlogs that explore the intersection of technology and creativity.
( 6
min )
B2B sales requires effective prediction of customer growth, identification of
upsell potential, and mitigation of churn risks. LinkedIn sales representatives
traditionally relied on intuition and fragmented data signals to assess
customer performance. This resulted in significant time investment in data
understanding as well as strategy formulation and under-investment in active
selling. To overcome this challenge, we developed a data product called Account
Prioritizer, an intelligent sales account prioritization engine. It uses
machine learning recommendation models and integrated account-level explanation
algorithms within the sales CRM to automate the manual process of sales book
prioritization. A successful A/B test demonstrated that the Account Prioritizer
generated a substantial +8.08% increase in renewal bookings for the LinkedIn
Business.
( 2
min )
We present Malafide, a universal adversarial attack against automatic speaker
verification (ASV) spoofing countermeasures (CMs). By introducing convolutional
noise using an optimised linear time-invariant filter, Malafide attacks can be
used to compromise CM reliability while preserving other speech attributes such
as quality and the speaker's voice. In contrast to other adversarial attacks
proposed recently, Malafide filters are optimised independently of the input
utterance and duration, are tuned instead to the underlying spoofing attack,
and require the optimisation of only a small number of filter coefficients.
Even so, they degrade CM performance estimates by an order of magnitude, even
in black-box settings, and can also be configured to overcome integrated CM and
ASV subsystems. Integrated solutions that use self-supervised learning CMs,
however, are more robust, under both black-box and white-box settings.
( 2
min )
Research in 3D semantic segmentation has been increasing performance metrics,
like the IoU, by scaling model complexity and computational resources, leaving
behind researchers and practitioners that (1) cannot access the necessary
resources and (2) do need transparency on the model decision mechanisms. In
this paper, we propose SCENE-Net, a low-resource white-box model for 3D point
cloud semantic segmentation. SCENE-Net identifies signature shapes on the point
cloud via group equivariant non-expansive operators (GENEOs), providing
intrinsic geometric interpretability. Our training time on a laptop is 85~min,
and our inference time is 20~ms. SCENE-Net has 11 trainable geometrical
parameters and requires fewer data than black-box models. SCENE--Net offers
robustness to noisy labeling and data imbalance and has comparable IoU to
state-of-the-art methods. With this paper, we release a 40~000 Km labeled
dataset of rural terrain point clouds and our code implementation.
( 2
min )
We study the problem of discrete distribution estimation in KL divergence and
provide concentration bounds for the Laplace estimator. We show that the
deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the
best prior result of $k/n$. We also establish a matching lower bound that shows
that our bounds are tight up to polylogarithmic factors.
( 2
min )
We consider a cross-silo federated learning (FL) setting where a machine
learning model with a fully connected first layer is trained between different
clients and a central server using FedAvg, and where the aggregation step can
be performed with secure aggregation (SA). We present SRATTA an attack relying
only on aggregated models which, under realistic assumptions, (i) recovers data
samples from the different clients, and (ii) groups data samples coming from
the same client together. While sample recovery has already been explored in an
FL setting, the ability to group samples per client, despite the use of SA, is
novel. This poses a significant unforeseen security threat to FL and
effectively breaks SA. We show that SRATTA is both theoretically grounded and
can be used in practice on realistic models and datasets. We also propose
counter-measures, and claim that clients should play an active role to
guarantee their privacy during training.
( 2
min )
We consider policy optimization in contextual bandits, where one is given a
fixed dataset of logged interactions. While pessimistic regularizers are
typically used to mitigate distribution shift, prior implementations thereof
are not computationally efficient. We present the first oracle-efficient
algorithm for pessimistic policy optimization: it reduces to supervised
learning, leading to broad applicability. We also obtain best-effort
statistical guarantees analogous to those for pessimistic approaches in prior
work. We instantiate our approach for both discrete and continuous actions. We
perform extensive experiments in both settings, showing advantage over
unregularized policy optimization across a wide range of configurations.
( 2
min )
The internal functional behavior of trained Deep Neural Networks is
notoriously difficult to interpret. Activation-maximization approaches are one
set of techniques used to interpret and analyze trained deep-learning models.
These consist in finding inputs that maximally activate a given neuron or
feature map. These inputs can be selected from a data set or obtained by
optimization. However, interpretability methods may be subject to being
deceived. In this work, we consider the concept of an adversary manipulating a
model for the purpose of deceiving the interpretation. We propose an
optimization framework for performing this manipulation and demonstrate a
number of ways that popular activation-maximization interpretation techniques
associated with CNNs can be manipulated to change the interpretations, shedding
light on the reliability of these methods.
( 2
min )
This paper proposes an extension of Random Projection Depth (RPD) to cope
with multiple modalities and non-convexity on data clouds. In the framework of
the proposed method, the RPD is computed in a reproducing kernel Hilbert space.
With the help of kernel principal component analysis, we expect that the
proposed method can cope with the above multiple modalities and non-convexity.
The experimental results demonstrate that the proposed method outperforms RPD
and is comparable to other existing detection models on benchmark datasets
regarding Area Under the Curves (AUCs) of Receiver Operating Characteristic
(ROC).
( 2
min )
When using adversarial training, it is common practice to train against the
most egregious failures. However, this might imply using examples with
sensitive information (such as leaked passwords or security vulnerabilities) as
training data. One might assume that language models trained with gradient
descent never generate text snippets which were only present in examples
associated with the lowest possible reward. In this paper, we show that this
assumption is wrong: in some situations, large language models do learn from
such negatively-reinforced examples. We present a specific training setup that
enables Pythia-160M to generate passwords with a probability slightly greater
than chance, despite only showing it these passwords on examples where the
model is incentivized to not output these passwords. Our code is available at
https://github.com/FabienRoger/Learning-From-Negative-Examples
( 2
min )
A framework to learn a multi-modal distribution is proposed, denoted as the
Conditional Quantum Generative Adversarial Network (C-qGAN). The neural network
structure is strictly within a quantum circuit and, as a consequence, is shown
to represent a more efficient state preparation procedure than current methods.
This methodology has the potential to speed-up algorithms, such as Monte Carlo
analysis. In particular, after demonstrating the effectiveness of the network
in the learning task, the technique is applied to price Asian option
derivatives, providing the foundation for further research on other
path-dependent options.
( 2
min )
Large-scale language models, like ChatGPT, have garnered significant media
attention and stunned the public with their remarkable capacity for generating
coherent text from short natural language prompts. In this paper, we aim to
conduct a systematic inspection of ChatGPT's performance in two controllable
generation tasks, with respect to ChatGPT's ability to adapt its output to
different target audiences (expert vs. layman) and writing styles (formal vs.
informal). Additionally, we evaluate the faithfulness of the generated text,
and compare the model's performance with human-authored texts. Our findings
indicate that the stylistic variations produced by humans are considerably
larger than those demonstrated by ChatGPT, and the generated texts diverge from
human samples in several characteristics, such as the distribution of word
types. Moreover, we observe that ChatGPT sometimes incorporates factual errors
or hallucinations when adapting the text to suit a specific style.
( 2
min )
Generative model-based deep clustering frameworks excel in classifying
complex data, but are limited in handling dynamic and complex features because
they require prior knowledge of the number of clusters. In this paper, we
propose a nonparametric deep clustering framework that employs an infinite
mixture of Gaussians as a prior. Our framework utilizes a memoized online
variational inference method that enables the "birth" and "merge" moves of
clusters, allowing our framework to cluster data in a "dynamic-adaptive"
manner, without requiring prior knowledge of the number of features. We name
the framework as DIVA, a Dirichlet Process-based Incremental deep clustering
framework via Variational Auto-Encoder. Our framework, which outperforms
state-of-the-art baselines, exhibits superior performance in classifying
complex data with dynamically changing features, particularly in the case of
incremental features. We released our source code implementation at:
https://github.com/Ghiara/diva
( 2
min )
Accurate early congestion prediction can prevent unpleasant surprises at the
routing stage, playing a crucial character in assisting designers to iterate
faster in VLSI design cycles. In this paper, we introduce a novel strategy to
fully incorporate topological and geometrical features of circuits by making
several key designs in our network architecture. To be more specific, we
construct two individual graphs (geometry-graph, topology-graph) with distinct
edge construction schemes according to their unique properties. We then propose
a dual-branch network with different encoder layers in each pathway and
aggregate representations with a sophisticated fusion strategy. Our network,
named HybridNet, not only provides a simple yet effective way to capture the
geometric interactions of cells, but also preserves the original topological
relationships in the netlist. Experimental results on the ISPD2015 benchmarks
show that we achieve an improvement of 10.9% compared to previous methods.
( 2
min )
Diffusion Models (DMs) are state-of-the-art generative models that learn a
reversible corruption process from iterative noise addition and denoising. They
are the backbone of many generative AI applications, such as text-to-image
conditional generation. However, recent studies have shown that basic
unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a
type of output manipulation attack triggered by a maliciously embedded pattern
at model input. This paper presents a unified backdoor attack framework
(VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our
framework covers mainstream unconditional and conditional DMs (denoising-based
and score-based) and various training-free samplers for holistic evaluations.
Experiments show that our unified framework facilitates the backdoor analysis
of different DM configurations and provides new insights into caption-based
backdoor attacks on DMs.
( 2
min )
Over the last decade, deep neural networks have achieved state of the art in
computer vision tasks. These models, however, are susceptible to unusual
inputs, known as adversarial examples, that cause them to misclassify or
otherwise fail to detect objects. Here, we provide evidence that the increasing
success of adversarial attacks is primarily due to increasing their size. We
then demonstrate a method for generating the largest possible adversarial patch
by building a adversarial pattern out of repeatable elements. This approach
achieves a new state of the art in evading detection by YOLOv2 and YOLOv3.
Finally, we present an experiment that fails to replicate the prior success of
several attacks published in this field, and end with some comments on testing
and reproducibility.
( 2
min )
One of the grand challenges of reinforcement learning is the ability to
generalize to new tasks. However, general agents require a set of rich, diverse
tasks to train on. Designing a `foundation environment' for such tasks is
tricky -- the ideal environment would support a range of emergent phenomena, an
expressive task space, and fast runtime. To take a step towards addressing this
research bottleneck, this work presents Powderworld, a lightweight yet
expressive simulation environment running directly on the GPU. Within
Powderworld, two motivating challenges distributions are presented, one for
world-modelling and one for reinforcement learning. Each contains hand-designed
test tasks to examine generalization. Experiments indicate that increasing the
environment's complexity improves generalization for world models and certain
reinforcement learning agents, yet may inhibit learning in high-variance
environments. Powderworld aims to support the study of generalization by
providing a source of diverse tasks arising from the same core rules.
( 2
min )
Images cannot always be expected to come in a certain standard format and
orientation. Deep networks need to be trained to take into account unexpected
variations in orientation or format. For this purpose, training data should be
enriched to include different conditions. In this study, the effects of data
enrichment on the performance of deep networks in the super resolution problem
were investigated experimentally. A total of six basic image transformations
were used for the enrichment procedures. In the experiments, two deep network
models were trained with variants of the ILSVRC2012 dataset enriched by these
six image transformation processes. Considering a single image transformation,
it has been observed that the data enriched with 180 degree rotation provides
the best results. The most unsuccessful result was obtained when the models
were trained on the enriched data generated by the flip upside down process.
Models scored highest when trained with a mix of all transformations.
( 2
min )
The Monge-Amp\`ere equation is a fully nonlinear partial differential
equation (PDE) of fundamental importance in analysis, geometry and in the
applied sciences. In this paper we solve the Dirichlet problem associated with
the Monge-Amp\`ere equation using neural networks and we show that an ansatz
using deep input convex neural networks can be used to find the unique convex
solution. As part of our analysis we study the effect of singularities,
discontinuities and noise in the source function, we consider nontrivial
domains, and we investigate how the method performs in higher dimensions. We
investigate the convergence numerically and present error estimates based on a
stability result. We also compare this method to an alternative approach in
which standard feed-forward networks are used together with a loss function
which penalizes lack of convexity.
( 2
min )
The identification of key factors such as medications, diseases, and
relationships within electronic health records and clinical notes has a wide
range of applications in the clinical field. In the N2C2 2022 competitions,
various tasks were presented to promote the identification of key factors in
electronic health records (EHRs) using the Contextualized Medication Event
Dataset (CMED). Pretrained large language models (LLMs) demonstrated
exceptional performance in these tasks. This study aims to explore the
utilization of LLMs, specifically ChatGPT, for data augmentation to overcome
the limited availability of annotated data for identifying the key factors in
EHRs. Additionally, different pre-trained BERT models, initially trained on
extensive datasets like Wikipedia and MIMIC, were employed to develop models
for identifying these key variables in EHRs through fine-tuning on augmented
datasets. The experimental results of two EHR analysis tasks, namely medication
identification and medication event classification, indicate that data
augmentation based on ChatGPT proves beneficial in improving performance for
both medication identification and medication event classification.
( 2
min )
Hyperspectral Image (HSI)s cover hundreds or thousands of narrow spectral
bands, conveying a wealth of spatial and spectral information. However, due to
the instrumental errors and the atmospheric changes, the HSI obtained in
practice are often contaminated by noise and dead pixels(lines), resulting in
missing information that may severely compromise the subsequent applications.
We introduce here a novel HSI missing pixel prediction algorithm, called Low
Rank and Sparsity Constraint Plug-and-Play (LRS-PnP). It is shown that LRS-PnP
is able to predict missing pixels and bands even when all spectral bands of the
image are missing. The proposed LRS-PnP algorithm is further extended to a
self-supervised model by combining the LRS-PnP with the Deep Image Prior (DIP),
called LRS-PnP-DIP. In a series of experiments with real data, It is shown that
the LRS-PnP-DIP either achieves state-of-the-art inpainting performance
compared to other learning-based methods, or outperforms them.
( 2
min )
End-to-end (E2E) systems have shown comparable performance to hybrid systems
for automatic speech recognition (ASR). Word timings, as a by-product of ASR,
are essential in many applications, especially for subtitling and
computer-aided pronunciation training. In this paper, we improve the
frame-level classifier for word timings in E2E system by introducing label
priors in connectionist temporal classification (CTC) loss, which is adopted
from prior works, and combining low-level Mel-scale filter banks with
high-level ASR encoder output as input feature. On the internal Chinese corpus,
the proposed method achieves 95.68%/94.18% compared to the hybrid system
93.0%/90.22% on the word timing accuracy metrics. It also surpass a previous
E2E approach with an absolute increase of 4.80%/8.02% on the metrics on 7
languages. In addition, we further improve word timing accuracy by delaying CTC
peaks with frame-wise knowledge distillation, though only experimenting on
LibriSpeech.
( 2
min )
We study the problem of learning a single neuron with respect to the
$L_2^2$-loss in the presence of adversarial label noise. We give an efficient
algorithm that, for a broad family of activations including ReLUs, approximates
the optimal $L_2^2$-error within a constant factor. Our algorithm applies under
much milder distributional assumptions compared to prior work. The key
ingredient enabling our results is a novel connection to local error bounds
from optimization theory.
( 2
min )
Graph Neural Networks (GNNs) are able to achieve high classification accuracy
on many important real world datasets, but provide no rigorous notion of
predictive uncertainty. Quantifying the confidence of GNN models is difficult
due to the dependence between datapoints induced by the graph structure.
We leverage recent advances in conformal prediction to construct prediction
sets for node classification in inductive learning scenarios. We do this by
taking an existing approach for conformal classification that relies on
\textit{exchangeable} data and modifying it by appropriately weighting the
conformal scores to reflect the network structure. We show through experiments
on standard benchmark datasets using popular GNN models that our approach
provides tighter and better calibrated prediction sets than a naive application
of conformal prediction.
( 2
min )
The Monge-Amp\`ere equation is a fully nonlinear partial differential
equation (PDE) of fundamental importance in analysis, geometry and in the
applied sciences. In this paper we solve the Dirichlet problem associated with
the Monge-Amp\`ere equation using neural networks and we show that an ansatz
using deep input convex neural networks can be used to find the unique convex
solution. As part of our analysis we study the effect of singularities,
discontinuities and noise in the source function, we consider nontrivial
domains, and we investigate how the method performs in higher dimensions. We
investigate the convergence numerically and present error estimates based on a
stability result. We also compare this method to an alternative approach in
which standard feed-forward networks are used together with a loss function
which penalizes lack of convexity.
( 2
min )
We propose strategies to estimate and make inference on key features of
heterogeneous effects in randomized experiments. These key features include
best linear predictors of the effects using machine learning proxies, average
effects sorted by impact groups, and average characteristics of most and least
impacted units. The approach is valid in high dimensional settings, where the
effects are proxied (but not necessarily consistently estimated) by predictive
and causal machine learning methods. We post-process these proxies into
estimates of the key features. Our approach is generic, it can be used in
conjunction with penalized methods, neural networks, random forests, boosted
trees, and ensemble methods, both predictive and causal. Estimation and
inference are based on repeated data splitting to avoid overfitting and achieve
validity. We use quantile aggregation of the results across many potential
splits, in particular taking medians of p-values and medians and other
quantiles of confidence intervals. We show that quantile aggregation lowers
estimation risks over a single split procedure, and establish its principal
inferential properties. Finally, our analysis reveals ways to build provably
better machine learning proxies through causal learning: we can use the
objective functions that we develop to construct the best linear predictors of
the effects, to obtain better machine learning proxies in the initial step. We
illustrate the use of both inferential tools and causal learners with a
randomized field experiment that evaluates a combination of nudges to stimulate
demand for immunization in India.
( 3
min )
In online advertising markets, budget-constrained advertisers acquire ad
placements through repeated bidding in auctions on various platforms. We
present a strategy for bidding optimally in a set of auctions that may or may
not be incentive-compatible under the presence of budget constraints. Our
strategy maximizes the expected total utility across auctions while satisfying
the advertiser's budget constraints in expectation. Additionally, we
investigate the online setting where the advertiser must submit bids across
platforms while learning about other bidders' bids over time. Our algorithm has
$O(T^{3/4})$ regret under the full-information setting. Finally, we demonstrate
that our algorithms have superior cumulative regret on both synthetic and
real-world datasets of ad placement auctions, compared to existing adaptive
pacing algorithms.
( 2
min )
B2B sales requires effective prediction of customer growth, identification of
upsell potential, and mitigation of churn risks. LinkedIn sales representatives
traditionally relied on intuition and fragmented data signals to assess
customer performance. This resulted in significant time investment in data
understanding as well as strategy formulation and under-investment in active
selling. To overcome this challenge, we developed a data product called Account
Prioritizer, an intelligent sales account prioritization engine. It uses
machine learning recommendation models and integrated account-level explanation
algorithms within the sales CRM to automate the manual process of sales book
prioritization. A successful A/B test demonstrated that the Account Prioritizer
generated a substantial +8.08% increase in renewal bookings for the LinkedIn
Business.
( 2
min )
Measuring entity relatedness is a fundamental task for many natural language
processing and information retrieval applications. Prior work often studies
entity relatedness in static settings and an unsupervised manner. However,
entities in real-world are often involved in many different relationships,
consequently entity-relations are very dynamic over time. In this work, we
propose a neural networkbased approach for dynamic entity relatedness,
leveraging the collective attention as supervision. Our model is capable of
learning rich and different entity representations in a joint framework.
Through extensive experiments on large-scale datasets, we demonstrate that our
method achieves better results than competitive baselines.
( 2
min )
Announcements Can Boards of Directors’ steer AI business ethics? As enterprises implement complex AI capabilities as part of business strategy, corporate boards face an increasingly common moral quandary: Do the business (and of course monetary) benefits of AI outweigh the societal risks? In his three-part series that continues this week, Bill Schmarzo examines the board’s… Read More »DSC Weekly 13 June 2023 – Can Boards of Directors’ steer AI business ethics?
The post DSC Weekly 13 June 2023 – Can Boards of Directors’ steer AI business ethics? appeared first on Data Science Central.
( 20
min )
This three-part series outlines the challenges and actions that the Board of Directors for organizations must address as they guide their organization’s responsible and ethical deployment of Artificial Intelligence (AI). Part one covered mitigating the impacts of AI Confirmation Bias. In part two, we discuss the potential unintended consequences of AI deployment and how to… Read More »Artificial Intelligence: A Board of Directors Challenge – Part II
The post Artificial Intelligence: A Board of Directors Challenge – Part II appeared first on Data Science Central.
( 21
min )
The use of self-supervision from image-text pairs has been a key enabler in the development of scalable and flexible vision-language AI models in not only general domains but also in biomedical domains such as radiology. The goal in the radiology setting is to produce rich training signals without requiring manual labels so the models can […]
The post Accounting for past imaging studies: Enhancing radiology AI and reporting appeared first on Microsoft Research.
( 15
min )
This post is co-written with Jad Chamoun, Director of Engineering at Forethought Technologies, Inc. and Salina Wu, Senior ML Engineer at Forethought Technologies, Inc. Forethought is a leading generative AI suite for customer service. At the core of its suite is the innovative SupportGPT™ technology which uses machine learning to transform the customer support lifecycle—increasing deflection, […]
( 13
min )
Implementing a modern data architecture provides a scalable method to integrate data from disparate sources. By organizing data by business domains instead of infrastructure, each domain can choose tools that suit their needs. Organizations can maximize the value of their modern data architecture with generative AI solutions while innovating continuously. The natural language capabilities allow […]
( 10
min )
This post discusses how to structure internal knowledge sharing using Amazon Kendra and AWS Lambda and how Amazon Kendra solves the obstacles around knowledge sharing many companies face.
( 9
min )
The size of the machine learning (ML) models––large language models (LLMs) and foundation models (FMs)––is growing fast year-over-year, and these models need faster and more powerful accelerators, especially for generative AI. AWS Inferentia2 was designed from the ground up to deliver higher performance while lowering the cost of LLMs and generative AI inference. In this […]
( 11
min )
Last week, Technology Innovation Institute (TII) launched TII Falcon LLM, an open-source foundational large language model (LLM). Trained on 1 trillion tokens with Amazon SageMaker, Falcon boasts top-notch performance (#1 on the Hugging Face leaderboard at time of writing) while being comparatively lightweight and less expensive to host than other LLMs such as llama-65B. In […]
( 10
min )
Rendered.ai is easing AI training for developers, data scientists and others with its platform-as-a-service for synthetic data generation, or SDG. Training computer vision AI models requires massive, high-quality, diverse and unbiased datasets. These can be challenging and costly to obtain, especially with increasing demands both of and for AI. The Rendered.ai platform-as-a-service helps to solve Read article >
( 6
min )
For industrial businesses to reach the next level of digitalization, they need to create accurate, virtual representations of their physical systems. NVIDIA is working with Hexagon, the Stockholm-based global leader in digital reality solutions combining sensor, software and autonomous technologies, to equip enterprises with the tools and solutions they need to build physically accurate, perfectly Read article >
( 5
min )
We’re announcing updates including more steerable API models, function calling capabilities, longer context, and lower prices.
( 4
min )
Open-source large language models (LLMs) have become popular, allowing researchers, developers, and organizations to access these models to foster innovation and experimentation. This encourages collaboration from the open-source community to contribute to developments and improvement of LLMs. Open-source LLMs provide transparency to the model architecture, training process, and training data, which allows researchers to understand […]
( 12
min )
GPT-J is an open-source 6-billion-parameter model released by Eleuther AI. The model is trained on the Pile and can perform various tasks in language processing. It can support a wide variety of use cases, including text classification, token classification, text generation, question and answering, entity extraction, summarization, sentiment analysis, and many more. GPT-J is a […]
( 10
min )
Exciting news! The highly anticipated AI Workshops begin in ~24 hours, and we don’t want you to miss out on this incredible opportunity!
( 4
min )
Kirk Kaiser grew up a fan of the video game Paperboy, where players act as cyclists delivering newspapers while encountering various obstacles, like ramps that appear in the middle of the street. This was the inspiration behind the software developer’s latest project using the NVIDIA Jetson platform for edge AI and robotics — a self-driving Read article >
( 6
min )
A new AI-based approach for controlling autonomous robots satisfies the often-conflicting goals of safety and stability.
( 9
min )
For reinforcement learning systems to be widely adopted, their users must
understand and trust them. We present a theoretical analysis of explaining
reinforcement learning using Shapley values, following a principled approach
from game theory for identifying the contribution of individual players to the
outcome of a cooperative game. We call this general framework Shapley Values
for Explaining Reinforcement Learning (SVERL). Our analysis exposes the
limitations of earlier uses of Shapley values in reinforcement learning. We
then develop an approach that uses Shapley values to explain agent performance.
In a variety of domains, SVERL produces meaningful explanations that match and
supplement human intuition.
( 2
min )
We study the Pareto frontier of two archetypal objectives in multi-armed
bandits, namely, regret minimization (RM) and best arm identification (BAI)
with a fixed horizon. It is folklore that the balance between exploitation and
exploration is crucial for both RM and BAI, but exploration is more critical in
achieving the optimal performance for the latter objective. To this end, we
design and analyze the BoBW-lil'UCB$(\gamma)$ algorithm. Complementarily, by
establishing lower bounds on the regret achievable by any algorithm with a
given BAI failure probability, we show that (i) no algorithm can simultaneously
perform optimally for both the RM and BAI objectives, and (ii)
BoBW-lil'UCB$(\gamma)$ achieves order-wise optimal performance for RM or BAI
under different values of $\gamma$. Our work elucidates the trade-off more
precisely by showing how the constants in previous works depend on certain
hardness parameters. Finally, we show that BoBW-lil'UCB outperforms a close
competitor UCB$_\alpha$ (Degenne et al., 2019) in terms of the time complexity
and the regret on diverse datasets such as MovieLens and Published Kinase
Inhibitor Set.
( 2
min )
While reinforcement learning (RL) has achieved great success in acquiring
complex skills solely from environmental interactions, it assumes that resets
to the initial state are readily available at the end of each episode. Such an
assumption hinders the autonomous learning of embodied agents due to the
time-consuming and cumbersome workarounds for resetting in the physical world.
Hence, there has been a growing interest in autonomous RL (ARL) methods that
are capable of learning from non-episodic interactions. However, existing works
on ARL are limited by their reliance on prior data and are unable to learn in
environments where task-relevant interactions are sparse. In contrast, we
propose a demonstration-free ARL algorithm via Implicit and Bi-directional
Curriculum (IBC). With an auxiliary agent that is conditionally activated upon
learning progress and a bidirectional goal curriculum based on optimal
transport, our method outperforms previous methods, even the ones that leverage
demonstrations.
( 2
min )
The adoption of digital transformation was not expressed in building an
African face shape classifier. In this paper, an approach is presented that
uses k-means to classify African women images. African women rely on beauty
standards recommendations, personal preference, or the newest trends in
hairstyles to decide on the appropriate hairstyle for them. In this paper, an
approach is presented that uses K-means clustering to classify African women's
images. In order to identify potential facial clusters, Haarcascade is used for
feature-based training, and K-means clustering is applied for image
classification.
( 2
min )
We develop and analyze a general technique for learning with an unknown
distribution drift. Given a sequence of independent observations from the last
$T$ steps of a drifting distribution, our algorithm agnostically learns a
family of functions with respect to the current distribution at time $T$.
Unlike previous work, our technique does not require prior knowledge about the
magnitude of the drift. Instead, the algorithm adapts to the sample data.
Without explicitly estimating the drift, the algorithm learns a family of
functions with almost the same error as a learning algorithm that knows the
magnitude of the drift in advance. Furthermore, since our algorithm adapts to
the data, it can guarantee a better learning error than an algorithm that
relies on loose bounds on the drift.
( 2
min )
Data heterogeneity across clients is a key challenge in federated learning.
Prior works address this by either aligning client and server models or using
control variates to correct client model drift. Although these methods achieve
fast convergence in convex or simple non-convex problems, the performance in
over-parameterized models such as deep neural networks is lacking. In this
paper, we first revisit the widely used FedAvg algorithm in a deep neural
network to understand how data heterogeneity influences the gradient updates
across the neural network layers. We observe that while the feature extraction
layers are learned efficiently by FedAvg, the substantial diversity of the
final classification layers across clients impedes the performance. Motivated
by this, we propose to correct model drift by variance reduction only on the
final layers. We demonstrate that this significantly outperforms existing
benchmarks at a similar or lower communication cost. We furthermore provide
proof for the convergence rate of our algorithm.
( 2
min )
Adaptive gating plays a key role in temporal data processing via classical
recurrent neural networks (RNN), as it facilitates retention of past
information necessary to predict the future, providing a mechanism that
preserves invariance to time warping transformations. This paper builds on
quantum recurrent neural networks (QRNNs), a dynamic model with quantum memory,
to introduce a novel class of temporal data processing quantum models that
preserve invariance to time-warping transformations of the (classical)
input-output sequences. The model, referred to as time warping-invariant QRNN
(TWI-QRNN), augments a QRNN with a quantum-classical adaptive gating mechanism
that chooses whether to apply a parameterized unitary transformation at each
time step as a function of the past samples of the input sequence via a
classical recurrent model. The TWI-QRNN model class is derived from first
principles, and its capacity to successfully implement time-warping
transformations is experimentally demonstrated on examples with classical or
quantum dynamics.
( 2
min )
The Linear-Quadratic Regulation (LQR) problem with unknown system parameters
has been widely studied, but it has remained unclear whether $\tilde{
\mathcal{O}}(\sqrt{T})$ regret, which is the best known dependence on time, can
be achieved almost surely. In this paper, we propose an adaptive LQR controller
with almost surely $\tilde{ \mathcal{O}}(\sqrt{T})$ regret upper bound. The
controller features a circuit-breaking mechanism, which circumvents potential
safety breach and guarantees the convergence of the system parameter estimate,
but is shown to be triggered only finitely often and hence has negligible
effect on the asymptotic performance of the controller. The proposed controller
is also validated via simulation on Tennessee Eastman Process~(TEP), a commonly
used industrial process example.
( 2
min )
Unsupervised video domain adaptation is a practical yet challenging task. In
this work, for the first time, we tackle it from a disentanglement view. Our
key idea is to handle the spatial and temporal domain divergence separately
through disentanglement. Specifically, we consider the generation of
cross-domain videos from two sets of latent factors, one encoding the static
information and another encoding the dynamic information. A Transfer Sequential
VAE (TranSVAE) framework is then developed to model such generation. To better
serve for adaptation, we propose several objectives to constrain the latent
factors. With these constraints, the spatial divergence can be readily removed
by disentangling the static domain-specific information out, and the temporal
divergence is further reduced from both frame- and video-levels through
adversarial learning. Extensive experiments on the UCF-HMDB, Jester, and
Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE
compared with several state-of-the-art methods. The code with reproducible
results is publicly accessible.
( 2
min )
A generic, fast and asymptotically efficient method for parametric estimation
is described. It is based on the stochastic gradient descent on the
loglikelihood function corrected by a single step of the Fisher scoring
algorithm. We show theoretically and by simulations in the i.i.d. setting that
it is an interesting alternative to the usual stochastic gradient descent with
averaging or the adaptative stochastic gradient descent.
( 2
min )
We present L0Learn: an open-source package for sparse linear regression and
classification using $\ell_0$ regularization. L0Learn implements scalable,
approximate algorithms, based on coordinate descent and local combinatorial
optimization. The package is built using C++ and has user-friendly R and Python
interfaces. L0Learn can address problems with millions of features, achieving
competitive run times and statistical performance with state-of-the-art sparse
learning packages. L0Learn is available on both CRAN and GitHub
(https://cran.r-project.org/package=L0Learn and
https://github.com/hazimehh/L0Learn).
( 2
min )
We study the Conjugate Kernel associated to a multi-layer linear-width
feed-forward neural network with random weights, biases and data. We show that
the empirical spectral distribution of the Conjugate Kernel converges to a
deterministic limit. More precisely we obtain a deterministic equivalent for
its Stieltjes transform and its resolvent, with quantitative bounds involving
both the dimension and the spectral parameter. The limiting equivalent objects
are described by iterating free convolution of measures and classical matrix
operations involving the parameters of the model.
( 2
min )
We study the Pareto frontier of two archetypal objectives in multi-armed
bandits, namely, regret minimization (RM) and best arm identification (BAI)
with a fixed horizon. It is folklore that the balance between exploitation and
exploration is crucial for both RM and BAI, but exploration is more critical in
achieving the optimal performance for the latter objective. To this end, we
design and analyze the BoBW-lil'UCB$(\gamma)$ algorithm. Complementarily, by
establishing lower bounds on the regret achievable by any algorithm with a
given BAI failure probability, we show that (i) no algorithm can simultaneously
perform optimally for both the RM and BAI objectives, and (ii)
BoBW-lil'UCB$(\gamma)$ achieves order-wise optimal performance for RM or BAI
under different values of $\gamma$. Our work elucidates the trade-off more
precisely by showing how the constants in previous works depend on certain
hardness parameters. Finally, we show that BoBW-lil'UCB outperforms a close
competitor UCB$_\alpha$ (Degenne et al., 2019) in terms of the time complexity
and the regret on diverse datasets such as MovieLens and Published Kinase
Inhibitor Set.
( 2
min )
Annotating data for supervised learning can be costly. When the annotation
budget is limited, active learning can be used to select and annotate those
observations that are likely to give the most gain in model performance. We
propose an active learning algorithm that, in addition to selecting which
observation to annotate, selects the precision of the annotation that is
acquired. Assuming that annotations with low precision are cheaper to obtain,
this allows the model to explore a larger part of the input space, with the
same annotation budget. We build our acquisition function on the previously
proposed BALD objective for Gaussian Processes, and empirically demonstrate the
gains of being able to adjust the annotation precision in the active learning
loop.
( 2
min )
We consider a simulation optimization problem for a context-dependent
decision-making, which aims to determine the top-m designs for all contexts.
Under a Bayesian framework, we formulate the optimal dynamic sampling decision
as a stochastic dynamic programming problem, and develop a sequential sampling
policy to efficiently learn the performance of each design under each context.
The asymptotically optimal sampling ratios are derived to attain the optimal
large deviations rate of the worst-case of probability of false selection. The
proposed sampling policy is proved to be consistent and its asymptotic sampling
ratios are asymptotically optimal. Numerical experiments demonstrate that the
proposed method improves the efficiency for selection of top-m
context-dependent designs.
( 2
min )
In privacy-preserving machine learning, differentially private stochastic
gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient
clipping and noise addition. A recent focus in private learning research is
improving the performance of DP-SGD on private data by incorporating priors
that are learned on real-world public data. In this work, we explore how we can
improve the privacy-utility tradeoff of DP-SGD by learning priors from images
generated by random processes and transferring these priors to private data. We
propose DP-RandP, a three-phase approach. We attain new state-of-the-art
accuracy when training from scratch on CIFAR10, CIFAR100, and MedMNIST for a
range of privacy budgets $\varepsilon \in [1, 8]$. In particular, we improve
the previous best reported accuracy on CIFAR10 from $60.6 \%$ to $72.3 \%$ for
$\varepsilon=1$. Our code is available at
https://github.com/inspire-group/DP-RandP.
( 2
min )
Learning with symmetric positive definite (SPD) matrices has many
applications in machine learning. Consequently, understanding the Riemannian
geometry of SPD matrices has attracted much attention lately. A particular
Riemannian geometry of interest is the recently proposed Bures-Wasserstein (BW)
geometry which builds on the Wasserstein distance between the Gaussian
densities. In this paper, we propose a novel generalization of the BW geometry,
which we call the GBW geometry. The proposed generalization is parameterized by
a symmetric positive definite matrix $\mathbf{M}$ such that when $\mathbf{M} =
\mathbf{I}$, we recover the BW geometry. We provide a rigorous treatment to
study various differential geometric notions on the proposed novel generalized
geometry which makes it amenable to various machine learning applications. We
also present experiments that illustrate the efficacy of the proposed GBW
geometry over the BW geometry.
( 2
min )
We develop simple differentially private optimization algorithms that move
along directions of (expected) descent to find an approximate second-order
solution for nonconvex ERM. We use line search, mini-batching, and a two-phase
strategy to improve the speed and practicality of the algorithm. Numerical
experiments demonstrate the effectiveness of these approaches.
( 2
min )
The goal of this paper is to revisit Kernel Principal Component Analysis
(KPCA) through dualization of a difference of convex functions. This allows to
naturally extend KPCA to multiple objective functions and leads to efficient
gradient-based algorithms avoiding the expensive SVD of the Gram matrix.
Particularly, we consider objective functions that can be written as Moreau
envelopes, demonstrating how to promote robustness and sparsity within the same
framework. The proposed method is evaluated on synthetic and real-world
benchmarks, showing significant speedup in KPCA training time as well as
highlighting the benefits in terms of robustness and sparsity.
( 2
min )
Recipients Luis Antonio Benítez, Carolina Cuesta-Lazaro, and Fernando Romero López receive support for their scientific research.
( 6
min )
ONNX (Open Neural Network Exchange) is an open-source standard for representing deep learning models widely supported by many providers. ONNX provides tools for optimizing and quantizing models to reduce the memory and compute needed to run machine learning (ML) models. One of the biggest benefits of ONNX is that it provides a standardized format for […]
( 14
min )
We are excited to announce the open-source release of GraphStorm 0.1, a low-code enterprise graph machine learning (ML) framework to build, train, and deploy graph ML solutions on complex enterprise-scale graphs in days instead of months. With GraphStorm, you can build solutions that directly take into account the structure of relationships or interactions between billions […]
( 9
min )
Microsoft publicly endorsed Open AI, with ‘Copilot’ embedded in every single bit of the Microsoft stack. Behind the scenes, with everything closed source, nobody knew if these AI assistants were driven by Cortana, Bing, or Open AI. The assistant technology is not new, and other than code generation and assisted writing, some wonder what value… Read More »The unannounced next-level partnership between Microsoft and Databricks
The post The unannounced next-level partnership between Microsoft and Databricks appeared first on Data Science Central.
( 22
min )
High-quality app development can significantly drive your business growth and success while boosting customer satisfaction and bringing in more clients. However, with millions of apps existing in the market, standing out from the competition requires more than just a great idea and an appealing design. Data engineering is what can help you, playing a pivotal… Read More »The Importance of Data Engineering for a Profitable App Development
The post The Importance of Data Engineering for a Profitable App Development appeared first on Data Science Central.
( 21
min )
When meteor showers occur every few months, viewers get to watch a dazzling scene of shooting stars and light streaks scattering across the night sky. Normally, meteors are just small pieces of rock and dust from space that quickly burn up upon entering Earth’s atmosphere. But the story would take a darker turn if a Read article >
( 7
min )
Mixture-of-Expert (MoE) models have obtained state-of-the-art performance in
Neural Machine Translation (NMT) tasks. Existing works in MoE mostly consider a
homogeneous design where the same number of experts of the same size are placed
uniformly throughout the network. Furthermore, existing MoE works do not
consider computational constraints (e.g., FLOPs, latency) to guide their
design. To this end, we develop AutoMoE -- a framework for designing
heterogeneous MoE's under computational constraints. AutoMoE leverages Neural
Architecture Search (NAS) to obtain efficient sparse MoE sub-transformers with
4x inference speedup (CPU) and FLOPs reduction over manually designed
Transformers, with parity in BLEU score over dense Transformer and within 1
BLEU point of MoE SwitchTransformer, on aggregate over benchmark datasets for
NMT. Heterogeneous search space with dense and sparsely activated Transformer
modules (e.g., how many experts? where to place them? what should be their
sizes?) allows for adaptive compute -- where different amounts of computations
are used for different tokens in the input. Adaptivity comes naturally from
routing decisions which send tokens to experts of different sizes. AutoMoE
code, data, and trained models are available at https://aka.ms/AutoMoE.
( 2
min )
This paper presents a novel Sequence-to-Sequence (Seq2Seq) model based on a
transformer-based attention mechanism and temporal pooling for Non-Intrusive
Load Monitoring (NILM) of smart buildings. The paper aims to improve the
accuracy of NILM by using a deep learning-based method. The proposed method
uses a Seq2Seq model with a transformer-based attention mechanism to capture
the long-term dependencies of NILM data. Additionally, temporal pooling is used
to improve the model's accuracy by capturing both the steady-state and
transient behavior of appliances. The paper evaluates the proposed method on a
publicly available dataset and compares the results with other state-of-the-art
NILM techniques. The results demonstrate that the proposed method outperforms
the existing methods in terms of both accuracy and computational efficiency.
( 2
min )
In recent years, indoor human presence detection based on supervised learning
(SL) and channel state information (CSI) has attracted much attention. However,
existing studies that rely on spatial information of CSI are susceptible to
environmental changes which degrade prediction accuracy. Moreover, SL-based
methods require time-consuming data labeling for retraining models. Therefore,
it is imperative to design a continuously monitored model using a
semi-supervised learning (SSL) based scheme. In this paper, we conceive a
bifold teacher-student (BTS) learning approach for indoor human presence
detection in an adjoining two-room scenario. The proposed SSL-based primal-dual
teacher-student network intelligently learns spatial and temporal features from
labeled and unlabeled CSI datasets. Additionally, the enhanced penalized loss
function leverages entropy and distance measures to distinguish drifted data,
i.e., features of new datasets affected by time-varying effects and altered
from the original distribution. Experimental results demonstrate that the
proposed BTS system sustains asymptotic accuracy after retraining the model
with unlabeled data. Furthermore, BTS outperforms existing SSL-based models in
terms of the highest detection accuracy while achieving the asymptotic
performance of SL-based methods.
( 2
min )
The use of Shap scores has become widespread in Explainable AI. However,
their computation is in general intractable, in particular when done with a
black-box classifier, such as neural network. Recent research has unveiled
classes of open-box Boolean Circuit classifiers for which Shap can be computed
efficiently. We show how to transform binary neural networks into those
circuits for efficient Shap computation. We use logic-based knowledge
compilation techniques. The performance gain is huge, as we show in the light
of our experiments.
( 2
min )
We have recently witnessed a number of impressive results on hard
mathematical reasoning problems with language models. At the same time, the
robustness of these models has also been called into question; recent works
have shown that models can rely on shallow patterns in the problem description
when generating a solution. Building on the idea of behavioral testing, we
propose a novel framework, which pins down the causal effect of various factors
in the input, e.g., the surface form of the problem text, the operands, and
math operators on the output solution. By grounding the behavioral analysis in
a causal graph describing an intuitive reasoning process, we study the behavior
of language models in terms of robustness and sensitivity to direct
interventions in the input space. We apply our framework on a test bed of math
word problems. Our analysis shows that robustness does not appear to
continuously improve as a function of size, but the GPT-3 Davinci models (175B)
achieve a dramatic improvement in both robustness and sensitivity compared to
all other GPT variants.
( 2
min )
Despite the intense attention and considerable investment into clinical
machine learning research, relatively few applications have been deployed at a
large-scale in a real-world clinical environment. While research is important
in advancing the state-of-the-art, translation is equally important in bringing
these techniques and technologies into a position to ultimately impact
healthcare. We believe a lack of appreciation for several considerations are a
major cause for this discrepancy between expectation and reality. To better
characterize a holistic perspective among researchers and practitioners, we
survey several practitioners with commercial experience in developing CML for
clinical deployment. Using these insights, we identify several main categories
of challenges in order to better design and develop clinical machine learning
applications.
( 2
min )
Denoising is intuitively related to projection. Indeed, under the manifold
hypothesis, adding random noise is approximately equivalent to orthogonal
perturbation. Hence, learning to denoise is approximately learning to project.
In this paper, we use this observation to reinterpret denoising diffusion
models as approximate gradient descent applied to the Euclidean distance
function. We then provide straight-forward convergence analysis of the DDIM
sampler under simple assumptions on the projection-error of the denoiser.
Finally, we propose a new sampler based on two simple modifications to DDIM
using insights from our theoretical results. In as few as 5-10 function
evaluations, our sampler achieves state-of-the-art FID scores on pretrained
CIFAR-10 and CelebA models and can generate high quality samples on latent
diffusion models.
( 2
min )
Privately generating synthetic data from a table is an important brick of a
privacy-first world. We propose and investigate a simple approach of treating
each row in a table as a sentence and training a language model with
differential privacy. We show this approach obtains competitive results in
modelling tabular data across multiple datasets, even at small scales that
favor alternative methods based on marginal distributions.
( 2
min )
We propose a new approach to constructing a neural network for predicting
expectations of stochastic differential equations. The proposed method does not
need data sets of inputs and outputs; instead, the information obtained from
the time-evolution equations, i.e., the corresponding dual process, is directly
compared with the weights in the neural network. As a demonstration, we
construct neural networks for the Ornstein-Uhlenbeck process and the noisy van
der Pol system. The remarkable feature of learned networks with the proposed
method is the accuracy of inputs near the origin. Hence, it would be possible
to avoid the overfitting problem because the learned network does not depend on
training data sets.
( 2
min )
The estimation of causal effects is a primary goal of behavioral, social,
economic and biomedical sciences. Under the unconfoundedness condition,
adjustment for confounders requires estimating the nuisance functions relating
outcome and/or treatment to confounders. This paper considers a generalized
optimization framework for efficient estimation of general treatment effects
using feedforward artificial neural networks (ANNs) when the number of
covariates is allowed to increase with the sample size. We estimate the
nuisance function by ANNs, and develop a new approximation error bound for the
ANNs approximators when the nuisance function belongs to a mixed Sobolev space.
We show that the ANNs can alleviate the curse of dimensionality under this
circumstance. We further establish the consistency and asymptotic normality of
the proposed treatment effects estimators, and apply a weighted bootstrap
procedure for conducting inference. The proposed methods are illustrated via
simulation studies and a real data application.
( 2
min )
We study the mean estimation problem under communication and local
differential privacy constraints. While previous work has proposed
\emph{order}-optimal algorithms for the same problem (i.e., asymptotically
optimal as we spend more bits), \emph{exact} optimality (in the non-asymptotic
setting) still has not been achieved. In this work, we take a step towards
characterizing the \emph{exact}-optimal approach in the presence of shared
randomness (a random variable shared between the server and the user) and
identify several necessary conditions for \emph{exact} optimality. We prove
that one of the necessary conditions is to utilize a rotationally symmetric
shared random codebook. Based on this, we propose a randomization mechanism
where the codebook is a randomly rotated simplex -- satisfying the necessary
properties of the \emph{exact}-optimal codebook. The proposed mechanism is
based on a $k$-closest encoding which we prove to be \emph{exact}-optimal for
the randomly rotated simplex codebook.
( 2
min )
Data scientists need a consistent and reproducible environment for machine learning (ML) and data science workloads that enables managing dependencies and is secure. AWS Deep Learning Containers already provides pre-built Docker images for training and serving models in common frameworks such as TensorFlow, PyTorch, and MXNet. To improve this experience, we announced a public beta […]
( 8
min )
Customers expect quick and efficient service from businesses in today’s fast-paced world. But providing excellent customer service can be significantly challenging when the volume of inquiries outpaces the human resources employed to address them. However, businesses can meet this challenge while providing personalized and efficient customer service with the advancements in generative artificial intelligence (generative […]
( 11
min )
Amazon Personalize now enables popularity tuning for its Similar-Items recipe (aws-similar-items). Similar-Items generates recommendations that are similar to the item that a user selects, helping users discover new items in your catalog based on the previous behavior of all users and item metadata. Previously, this capability was only available for SIMS, the other Related_Items recipe […]
( 5
min )
By applying a language model to protein-drug interactions, researchers can quickly screen large libraries of potential drug compounds.
( 9
min )
The scientists used a natural language-based logical inference dataset to create smaller language models that outperformed much larger counterparts.
( 9
min )
Get into your favorite games faster by linking GeForce NOW to Steam, Epic Games Store and Ubisoft accounts. And get a peek at more games coming to GeForce NOW later this year by tuning in to Ubisoft Forward on Monday, June 12, when the game publisher will reveal its latest news and announcements. Plus, two Read article >
( 5
min )
Emre Kiciman and Amit Sharma join Ashley Llorens to discuss the causal capabilities of LLMs and ongoing journeys with GPT-3.5 and GPT-4 in the newest episode of the Microsoft Research Podcast series, "AI Frontiers."
The post AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma appeared first on Microsoft Research.
( 30
min )
We introduce a randomized topological augmentor based on Schur complements
for Graph Contrastive Learning (GCL). Given a graph laplacian matrix, the
technique generates unbiased approximations of its Schur complements and treats
the corresponding graphs as augmented views. We discuss the benefits of our
approach, provide theoretical justifications and present connections with graph
diffusion. Unlike previous efforts, we study the empirical effectiveness of the
augmentor in a controlled fashion by varying the design choices for subsequent
GCL phases, such as encoding and contrasting. Extensive experiments on node and
graph classification benchmarks demonstrate that our technique consistently
outperforms pre-defined and adaptive augmentation approaches to achieve
state-of-the-art results.
( 2
min )
Efficient large-scale neural network training and inference on commodity CPU
hardware is of immense practical significance in democratizing deep learning
(DL) capabilities. Presently, the process of training massive models consisting
of hundreds of millions to billions of parameters requires the extensive use of
specialized hardware accelerators, such as GPUs, which are only accessible to a
limited number of institutions with considerable financial resources. Moreover,
there is often an alarming carbon footprint associated with training and
deploying these models. In this paper, we take a step towards addressing these
challenges by introducing BOLT, a sparse deep learning library for training
large-scale search and recommendation models on standard CPU hardware. BOLT
provides a flexible, high-level API for constructing models that will be
familiar to users of existing popular DL frameworks. By automatically tuning
specialized hyperparameters, BOLT also abstracts away the algorithmic details
of sparse network training. We evaluate BOLT on a number of information
retrieval tasks including product recommendations, text classification, graph
neural networks, and personalization. We find that our proposed system achieves
competitive performance with state-of-the-art techniques at a fraction of the
cost and energy consumption and an order-of-magnitude faster inference time.
BOLT has also been successfully deployed by multiple businesses to address
critical problems, and we highlight one customer deployment case study in the
field of e-commerce.
( 3
min )
In a context of malicious software detection, machine learning (ML) is widely
used to generalize to new malware. However, it has been demonstrated that ML
models can be fooled or may have generalization problems on malware that has
never been seen. We investigate the possible benefits of quantum algorithms for
classification tasks. We implement two models of Quantum Machine Learning
algorithms, and we compare them to classical models for the classification of a
dataset composed of malicious and benign executable files. We try to optimize
our algorithms based on methods found in the literature, and analyze our
results in an exploratory way, to identify the most interesting directions to
explore for the future.
( 2
min )
This paper proposes Meta-SAGE, a novel approach for improving the scalability
of deep reinforcement learning models for combinatorial optimization (CO)
tasks. Our method adapts pre-trained models to larger-scale problems in test
time by suggesting two components: a scale meta-learner (SML) and scheduled
adaptation with guided exploration (SAGE). First, SML transforms the context
embedding for subsequent adaptation of SAGE based on scale information. Then,
SAGE adjusts the model parameters dedicated to the context embedding for a
specific instance. SAGE introduces locality bias, which encourages selecting
nearby locations to determine the next location. The locality bias gradually
decays as the model is adapted to the target instance. Results show that
Meta-SAGE outperforms previous adaptation methods and significantly improves
scalability in representative CO tasks. Our source code is available at
https://github.com/kaist-silab/meta-sage
( 2
min )
Computer vision applications in transportation logistics and warehousing have
a huge potential for process automation. We present a structured literature
review on research in the field to help leverage this potential. The literature
is categorized w.r.t. the application, i.e. the task it tackles and w.r.t. the
computer vision techniques that are used. Regarding applications, we subdivide
the literature in two areas: Monitoring, i.e. observing and retrieving relevant
information from the environment, and manipulation, where approaches are used
to analyze and interact with the environment. Additionally, we point out
directions for future research and link to recent developments in computer
vision that are suitable for application in logistics. Finally, we present an
overview of existing datasets and industrial solutions. The results of our
analysis are also available online at https://a-nau.github.io/cv-in-logistics.
( 2
min )
Large language models (LLMs) with memory are computationally universal.
However, mainstream LLMs are not taking full advantage of memory, and the
designs are heavily influenced by biological brains. Due to their approximate
nature and proneness to the accumulation of errors, conventional neural memory
mechanisms cannot support LLMs to simulate complex reasoning. In this paper, we
seek inspiration from modern computer architectures to augment LLMs with
symbolic memory for complex multi-hop reasoning. Such a symbolic memory
framework is instantiated as an LLM and a set of SQL databases, where the LLM
generates SQL instructions to manipulate the SQL databases. We validate the
effectiveness of the proposed memory framework on a synthetic dataset requiring
complex reasoning. The project website is available at
https://chatdatabase.github.io/ .
( 2
min )
Automatic speech recognition (ASR) models are frequently exposed to data
distribution shifts in many real-world scenarios, leading to erroneous
predictions. To tackle this issue, an existing test-time adaptation (TTA)
method has recently been proposed to adapt the pre-trained ASR model on
unlabeled test instances without source data. Despite decent performance gain,
this work relies solely on naive greedy decoding and performs adaptation across
timesteps at a frame level, which may not be optimal given the sequential
nature of the model output. Motivated by this, we propose a novel TTA
framework, dubbed SGEM, for general ASR models. To treat the sequential output,
SGEM first exploits beam search to explore candidate output logits and selects
the most plausible one. Then, it utilizes generalized entropy minimization and
negative sampling as unsupervised objectives to adapt the model. SGEM achieves
state-of-the-art performance for three mainstream ASR models under various
domain shifts.
( 2
min )
Ultrasound imaging is one of the most prominent technologies to evaluate the
growth, progression, and overall health of a fetus during its gestation.
However, the interpretation of the data obtained from such studies is best left
to expert physicians and technicians who are trained and well-versed in
analyzing such images. To improve the clinical workflow and potentially develop
an at-home ultrasound-based fetal monitoring platform, we present a novel fetus
phantom ultrasound dataset, FPUS23, which can be used to identify (1) the
correct diagnostic planes for estimating fetal biometric values, (2) fetus
orientation, (3) their anatomical features, and (4) bounding boxes of the fetus
phantom anatomies at 23 weeks gestation. The entire dataset is composed of
15,728 images, which are used to train four different Deep Neural Network
models, built upon a ResNet34 backbone, for detecting aforementioned fetus
features and use-cases. We have also evaluated the models trained using our
FPUS23 dataset, to show that the information learned by these models can be
used to substantially increase the accuracy on real-world ultrasound fetus
datasets. We make the FPUS23 dataset and the pre-trained models publicly
accessible at https://github.com/bharathprabakaran/FPUS23, which will further
facilitate future research on fetal ultrasound imaging and analysis.
( 3
min )
Control variates can be a powerful tool to reduce the variance of Monte Carlo
estimators, but constructing effective control variates can be challenging when
the number of samples is small. In this paper, we show that when a large number
of related integrals need to be computed, it is possible to leverage the
similarity between these integration tasks to improve performance even when the
number of samples per task is very small. Our approach, called meta learning
CVs (Meta-CVs), can be used for up to hundreds or thousands of tasks. Our
empirical assessment indicates that Meta-CVs can lead to significant variance
reduction in such settings, and our theoretical analysis establishes general
conditions under which Meta-CVs can be successfully trained.
( 2
min )
Deciding how to optimally deploy sensors in a large, complex, and spatially
extended structure is critical to ensure that the surface pressure field is
accurately captured for subsequent analysis and design. In some cases,
reconstruction of missing data is required in downstream tasks such as the
development of digital twins. This paper presents a data-driven sparse sensor
selection algorithm, aiming to provide the most information contents for
reconstructing aerodynamic characteristics of wind pressures over tall building
structures parsimoniously. The algorithm first fits a set of basis functions to
the training data, then applies a computationally efficient QR algorithm that
ranks existing pressure sensors in order of importance based on the state
reconstruction to this tailored basis. The findings of this study show that the
proposed algorithm successfully reconstructs the aerodynamic characteristics of
tall buildings from sparse measurement locations, generating stable and optimal
solutions across a range of conditions. As a result, this study serves as a
promising first step toward leveraging the success of data-driven and machine
learning algorithms to supplement traditional genetic algorithms currently used
in wind engineering.
( 2
min )
In this paper, we propose a Boosting Tail Neural Network (BTNN) for improving
the performance of Realtime Custom Keyword Spotting (RCKS) that is still an
industrial challenge for demanding powerful classification ability with limited
computation resources. Inspired by Brain Science that a brain is only partly
activated for a nerve simulation and numerous machine learning algorithms are
developed to use a batch of weak classifiers to resolve arduous problems, which
are often proved to be effective. We show that this method is helpful to the
RCKS problem. The proposed approach achieve better performances in terms of
wakeup rate and false alarm.
In our experiments compared with those traditional algorithms that use only
one strong classifier, it gets 18\% relative improvement. We also point out
that this approach may be promising in future ASR exploration.
( 2
min )
AnalogVNN, a simulation framework built on PyTorch which can simulate the
effects of optoelectronic noise, limited precision, and signal normalization
present in photonic neural network accelerators. We use this framework to train
and optimize linear and convolutional neural networks with up to 9 layers and
~1.7 million parameters, while gaining insights into how normalization,
activation function, reduced precision, and noise influence accuracy in analog
photonic neural networks. By following the same layer structure design present
in PyTorch, the AnalogVNN framework allows users to convert most digital neural
network models to their analog counterparts with just a few lines of code,
taking full advantage of the open-source optimization, deep learning, and GPU
acceleration libraries available through PyTorch. Code is available at
https://analogvnn.github.io
( 2
min )
Information on natural phenomena and engineering systems is typically
contained in data. Data can be corrupted by systematic errors in models and
experiments. In this paper, we propose a tool to uncover the spatiotemporal
solution of the underlying physical system by removing the systematic errors
from data. The tool is the physics-constrained convolutional neural network
(PC-CNN), which combines information from both the systems governing equations
and data. We focus on fundamental phenomena that are modelled by partial
differential equations, such as linear convection, Burgers equation, and
two-dimensional turbulence. First, we formulate the problem, describe the
physics-constrained convolutional neural network, and parameterise the
systematic error. Second, we uncover the solutions from data corrupted by large
multimodal systematic errors. Third, we perform a parametric study for
different systematic errors. We show that the method is robust. Fourth, we
analyse the physical properties of the uncovered solutions. We show that the
solutions inferred from the PC-CNN are physical, in contrast to the data
corrupted by systematic errors that does not fulfil the governing equations.
This work opens opportunities for removing epistemic errors from models, and
systematic errors from measurements.
( 2
min )
We study robustness to test-time adversarial attacks in the regression
setting with $\ell_p$ losses and arbitrary perturbation sets. We address the
question of which function classes are PAC learnable in this setting. We show
that classes of finite fat-shattering dimension are learnable in both
realizable and agnostic settings. Moreover, for convex function classes, they
are even properly learnable. In contrast, some non-convex function classes
provably require improper learning algorithms. Our main technique is based on a
construction of an adversarially robust sample compression scheme of a size
determined by the fat-shattering dimension. Along the way, we introduce a novel
agnostic sample compression scheme for real-valued functions, which may be of
independent interest.
( 2
min )
Principal components analysis (PCA) is a fundamental algorithm in data
analysis. Its memory-restricted online versions are useful in many modern
applications, where the data are too large to fit in memory, or when data
arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two
online PCA algorithms that are based on rank-one updates. While ROIPCA is
typically more accurate, fROIPCA is faster and has comparable accuracy. We show
the relation between fROIPCA and an existing popular gradient algorithm for
online PCA, and in particular, prove that fROIPCA is in fact a gradient
algorithm with an optimal learning rate. We demonstrate numerically the
advantages of our algorithms over existing state-of-the-art algorithms in terms
of accuracy and runtime.
( 2
min )
We provide new estimates of an asymptotic upper bound on the entropy of
English using the large language model LLaMA-7B as a predictor for the next
token given a window of past tokens. This estimate is significantly smaller
than currently available estimates in \cite{cover1978convergent},
\cite{lutati2023focus}. A natural byproduct is an algorithm for lossless
compression of English text which combines the prediction from the large
language model with a lossless compression scheme. Preliminary results from
limited experiments suggest that our scheme outperforms state-of-the-art text
compression schemes such as BSC, ZPAQ, and paq8h.
( 2
min )
We present a comprehensive analysis of quantitatively evaluating explainable
artificial intelligence (XAI) techniques for remote sensing image
classification. Our approach leverages state-of-the-art machine learning
approaches to perform remote sensing image classification across multiple
modalities. We investigate the results of the models qualitatively through XAI
methods. Additionally, we compare the XAI methods quantitatively through
various categories of desired properties. Through our analysis, we offer
insights and recommendations for selecting the most appropriate XAI method(s)
to gain a deeper understanding of the models' decision-making processes. The
code for this work is publicly available.
( 2
min )
In this paper, we examine the problem of partial inference in the context of
structured prediction. Using a generative model approach, we consider the task
of maximizing a score function with unary and pairwise potentials in the space
of labels on graphs. Employing a two-stage convex optimization algorithm for
label recovery, we analyze the conditions under which a majority of the labels
can be recovered. We introduce a novel perspective on the Karush-Kuhn-Tucker
(KKT) conditions and primal and dual construction, and provide statistical and
topological requirements for partial recovery with provable guarantees.
( 2
min )
The Japanese writing system is complex, with three character types of
Hiragana, Katakana, and Kanji. Kanji consists of thousands of unique
characters, further adding to the complexity of character identification and
literature understanding. Being able to translate handwritten Japanese
characters into digital text is useful for data analysis, translation, learning
and cultural preservation. In this study, a machine learning approach to
analyzing and recognizing handwritten Japanese characters (Kanji) is proposed.
The study used an ensemble of three convolutional neural networks (CNNs) for
recognizing handwritten Kanji characters and utilized four datasets of MNIST,
K-MNIST, Kuzushiji-49 (K49) and the top 150 represented classes in the
Kuzushiji-Kanji (K-Kanji) dataset for its performance evaluation. The results
indicate feasibility of using proposed CNN-ensemble architecture for
recognizing handwritten characters, achieving 99.4%, 96.4%, 95.0% and 96.4%
classification accuracy on MNIST, K-MNIS, K49, and K-Kanji datasets
respectively.
( 2
min )
Approximate inference in Gaussian process (GP) models with non-conjugate
likelihoods gets entangled with the learning of the model hyperparameters. We
improve hyperparameter learning in GP models and focus on the interplay between
variational inference (VI) and the learning target. While VI's lower bound to
the marginal likelihood is a suitable objective for inferring the approximate
posterior, we show that a direct approximation of the marginal likelihood as in
Expectation Propagation (EP) is a better learning objective for hyperparameter
optimization. We design a hybrid training procedure to bring the best of both
worlds: it leverages conjugate-computation VI for inference and uses an EP-like
marginal likelihood approximation for hyperparameter learning. We compare VI,
EP, Laplace approximation, and our proposed training procedure and empirically
demonstrate the effectiveness of our proposal across a wide range of data sets.
( 2
min )
Despite progress in the field, significant parts of current XAI research are
still not on solid conceptual, ethical, or methodological grounds.
Unfortunately, these unfounded parts are not on the decline but continue to
grow. Many explanation techniques are still proposed without clarifying their
purpose. Instead, they are advertised with ever more fancy-looking heatmaps or
only seemingly relevant benchmarks. Moreover, explanation techniques are
motivated with questionable goals, such as building trust, or rely on strong
assumptions about the 'concepts' that deep learning algorithms learn. In this
paper, we highlight and discuss these and other misconceptions in current XAI
research. We also suggest steps to make XAI a more substantive area of
research.
( 2
min )
We show how to "compile" human-readable programs into standard decoder-only
transformer models. Our compiler, Tracr, generates models with known structure.
This structure can be used to design experiments. For example, we use it to
study "superposition" in transformers that execute multi-step algorithms.
Additionally, the known structure of Tracr-compiled models can serve as
ground-truth for evaluating interpretability methods. Commonly, because the
"programs" learned by transformers are unknown it is unclear whether an
interpretation succeeded. We demonstrate our approach by implementing and
examining programs including computing token frequencies, sorting, and
parenthesis checking. We provide an open-source implementation of Tracr at
https://github.com/deepmind/tracr.
( 2
min )
We study robustness to test-time adversarial attacks in the regression
setting with $\ell_p$ losses and arbitrary perturbation sets. We address the
question of which function classes are PAC learnable in this setting. We show
that classes of finite fat-shattering dimension are learnable in both
realizable and agnostic settings. Moreover, for convex function classes, they
are even properly learnable. In contrast, some non-convex function classes
provably require improper learning algorithms. Our main technique is based on a
construction of an adversarially robust sample compression scheme of a size
determined by the fat-shattering dimension. Along the way, we introduce a novel
agnostic sample compression scheme for real-valued functions, which may be of
independent interest.
( 2
min )
It is often very challenging to manually design reward functions for complex,
real-world tasks. To solve this, one can instead use reward learning to infer a
reward function from data. However, there are often multiple reward functions
that fit the data equally well, even in the infinite-data limit. This means
that the reward function is only partially identifiable. In this work, we
formally characterise the partial identifiability of the reward function given
several popular reward learning data sources, including expert demonstrations
and trajectory comparisons. We also analyse the impact of this partial
identifiability for several downstream tasks, such as policy optimisation. We
unify our results in a framework for comparing data sources and downstream
tasks by their invariances, with implications for the design and selection of
data sources for reward learning.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of structured symmetric positive-definite matrices
with the affine-invariant metric. We do so by proposing a generalized version
of the Riemannian normal coordinates that dynamically orthonormalizes the
metric and locally converts the problem into an unconstrained problem in the
Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning in low precision settings.
Code: https://github.com/yorkerlin/StructuredNGD-DL
( 2
min )
Principal components analysis (PCA) is a fundamental algorithm in data
analysis. Its memory-restricted online versions are useful in many modern
applications, where the data are too large to fit in memory, or when data
arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two
online PCA algorithms that are based on rank-one updates. While ROIPCA is
typically more accurate, fROIPCA is faster and has comparable accuracy. We show
the relation between fROIPCA and an existing popular gradient algorithm for
online PCA, and in particular, prove that fROIPCA is in fact a gradient
algorithm with an optimal learning rate. We demonstrate numerically the
advantages of our algorithms over existing state-of-the-art algorithms in terms
of accuracy and runtime.
( 2
min )
Prediction models are typically optimized independently from decision
optimization. A smart predict then optimize (SPO) framework optimizes
prediction models to minimize downstream decision regret. In this paper we
present dboost, the first general purpose implementation of smart gradient
boosting for `predict, then optimize' problems. The framework supports convex
quadratic cone programming and gradient boosting is performed by implicit
differentiation of a custom fixed-point mapping. Experiments comparing with
state-of-the-art SPO methods show that dboost can further reduce out-of-sample
decision regret.
( 2
min )
Empirical neural tangent kernels (eNTKs) can provide a good understanding of
a given network's representation: they are often far less expensive to compute
and applicable more broadly than infinite width NTKs. For networks with O
output units (e.g. an O-class classifier), however, the eNTK on N inputs is of
size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$
computation. Most existing applications have therefore used one of a handful of
approximations yielding $N \times N$ kernel matrices, saving orders of
magnitude of computation, but with limited to no justification. We prove that
one such approximation, which we call "sum of logits", converges to the true
eNTK at initialization for any network with a wide final "readout" layer. Our
experiments demonstrate the quality of this approximation for various uses
across a range of settings.
( 2
min )
We propose a novel Bayesian inference framework for distributed
differentially private linear regression. We consider a distributed setting
where multiple parties hold parts of the data and share certain summary
statistics of their portions in privacy-preserving noise. We develop a novel
generative statistical model for privately shared statistics, which exploits a
useful distributional relation between the summary statistics of linear
regression. Bayesian estimation of the regression coefficients is conducted
mainly using Markov chain Monte Carlo algorithms, while we also provide a fast
version to perform Bayesian estimation in one iteration. The proposed methods
have computational advantages over their competitors. We provide numerical
results on both real and simulated data, which demonstrate that the proposed
algorithms provide well-rounded estimation and prediction.
( 2
min )
Training large language models (LLMs) with billions of parameters can be challenging. In addition to designing the model architecture, researchers need to set up state-of-the-art training techniques for distributed training like mixed precision support, gradient accumulation, and checkpointing. With large models, the training setup is even more challenging because the available memory in a single […]
( 7
min )
You can now retrain machine learning (ML) models and automate batch prediction workflows with updated datasets in Amazon SageMaker Canvas, thereby making it easier to constantly learn and improve the model performance and drive efficiency. An ML model’s effectiveness depends on the quality and relevance of the data it’s trained on. As time progresses, the […]
( 10
min )
Amazon Lex is excited to announce Test Workbench, a new bot testing solution that provides tools to simplify and automate the bot testing process. During bot development, testing is the phase where developers check whether a bot meets the specific requirements, needs and expectations by identifying errors, defects, or bugs in the system before scaling. […]
( 9
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. Amazon Textract has a Tables feature within the AnalyzeDocument API that offers the ability to automatically extract tabular structures from any document. In this post, we discuss the improvements made to the Tables feature and […]
( 9
min )
This blog post is co-written with Dr. Ebtesam Almazrouei, Executive Director–Acting Chief AI Researcher of the AI-Cross Center Unit and Project Lead for LLM Projects at TII. United Arab Emirate’s (UAE) Technology Innovation Institute (TII), the applied research pillar of Abu Dhabi’s Advanced Technology Research Council, has launched Falcon LLM, a foundational large language model […]
( 10
min )
In the latest episode of NVIDIA’s AI Podcast, Anant Agarwal, founder of edX and chief platform officer at 2U, shared his vision for the future of online education and how AI is revolutionizing the learning experience. Agarwal, a strong advocate for massive open online courses, or MOOCs, discussed the importance of accessibility and quality in Read article >
( 4
min )
Getting discharged from the hospital is a major milestone for patients — but sometimes, it’s not the end of their road to recovery. Nearly 15% of hospital patients in the U.S. are readmitted within 30 days of their initial discharge, which is often associated with worse outcomes and higher costs for both patients and hospitals. Read article >
( 6
min )
In this issue: Peter Lee discusses AI in medicine. Plus, new research on data inference privacy in machine learning; PII leakage in language models; and automatic prompt organization with gradient descent and beam search.
The post Research Focus: Week of June 5, 2023 appeared first on Microsoft Research.
( 11
min )
Can you imagine a world where healthcare is more accessible, affordable, and efficient? Conversational AI is making this vision a reality. With the help of natural language processing (NLP) and machine learning (ML), conversational AI is transforming the way healthcare providers interact with patients. From scheduling appointments to monitoring health conditions, conversational AI has numerous… Read More »The impact of conversational AI on healthcare outcomes and patient satisfaction
The post The impact of conversational AI on healthcare outcomes and patient satisfaction appeared first on Data Science Central.
( 22
min )
In recent years, the web development industry has shifted towards Progressive Web Apps (PWAs) as the future of web development. PWAs are web applications that provide users with an app-like experience on their mobile devices. They do not have to download or install a separate native app. This emerging technology provides several benefits, including faster… Read More »Why are progressive web apps becoming the future of web development?
The post Why are progressive web apps becoming the future of web development? appeared first on Data Science Central.
( 22
min )
It has been reported that clustering-based topic models, which cluster
high-quality sentence embeddings with an appropriate word selection method, can
generate better topics than generative probabilistic topic models. However,
these approaches suffer from the inability to select appropriate parameters and
incomplete models that overlook the quantitative relation between words with
topics and topics with text. To solve these issues, we propose graph to topic
(G2T), a simple but effective framework for topic modelling. The framework is
composed of four modules. First, document representation is acquired using
pretrained language models. Second, a semantic graph is constructed according
to the similarity between document representations. Third, communities in
document semantic graphs are identified, and the relationship between topics
and documents is quantified accordingly. Fourth, the word--topic distribution
is computed based on a variant of TFIDF. Automatic evaluation suggests that G2T
achieved state-of-the-art performance on both English and Chinese documents
with different lengths.
( 2
min )
We propose causal isotonic calibration, a novel nonparametric method for
calibrating predictors of heterogeneous treatment effects. Furthermore, we
introduce cross-calibration, a data-efficient variant of calibration that
eliminates the need for hold-out calibration sets. Cross-calibration leverages
cross-fitted predictors and generates a single calibrated predictor using all
available data. Under weak conditions that do not assume monotonicity, we
establish that both causal isotonic calibration and cross-calibration achieve
fast doubly-robust calibration rates, as long as either the propensity score or
outcome regression is estimated accurately in a suitable sense. The proposed
causal isotonic calibrator can be wrapped around any black-box learning
algorithm, providing robust and distribution-free calibration guarantees while
preserving predictive performance.
( 2
min )
We introduce Brain-Inspired Modular Training (BIMT), a method for making
neural networks more modular and interpretable. Inspired by brains, BIMT embeds
neurons in a geometric space and augments the loss function with a cost
proportional to the length of each neuron connection. We demonstrate that BIMT
discovers useful modular neural networks for many simple tasks, revealing
compositional structures in symbolic formulas, interpretable decision
boundaries and features for classification, and mathematical structure in
algorithmic datasets. The ability to directly see modules with the naked eye
can complement current mechanistic interpretability strategies such as probes,
interventions or staring at all weights.
( 2
min )
In many machine learning applications, labeling datasets can be an arduous
and time-consuming task. Although research has shown that semi-supervised
learning techniques can achieve high accuracy with very few labels within the
field of computer vision, little attention has been given to how images within
a dataset should be selected for labeling. In this paper, we propose a novel
approach based on well-established self-supervised learning, clustering, and
manifold learning techniques that address this challenge of selecting an
informative image subset to label in the first instance, which is known as the
cold-start or unsupervised selective labelling problem. We test our approach
using several publicly available datasets, namely CIFAR10, Imagenette,
DeepWeeds, and EuroSAT, and observe improved performance with both supervised
and semi-supervised learning strategies when our label selection strategy is
used, in comparison to random sampling. We also obtain superior performance for
the datasets considered with a much simpler approach compared to other methods
in the literature.
( 2
min )
Machine learning techniques are effective for building predictive models
because they identify patterns in large datasets. Development of a model for
complex real-life problems often stop at the point of publication, proof of
concept or when made accessible through some mode of deployment. However, a
model in the medical domain risks becoming obsolete as patient demographics,
systems and clinical practices change. The maintenance and monitoring of
predictive model performance post-publication is crucial to enable their safe
and effective long-term use. We will assess the infrastructure required to
monitor the outputs of a machine learning algorithm, and present two scenarios
with examples of monitoring and updates of models, firstly on a breast cancer
prognosis model trained on public longitudinal data, and secondly on a
neurodegenerative stratification algorithm that is currently being developed
and tested in clinic.
( 2
min )
Recent work has shown that forward- and reverse- mode automatic
differentiation (AD) over the reals is almost always correct in a
mathematically precise sense. However, actual programs work with
machine-representable numbers (e.g., floating-point numbers), not reals. In
this paper, we study the correctness of AD when the parameter space of a neural
network consists solely of machine-representable numbers. In particular, we
analyze two sets of parameters on which AD can be incorrect: the incorrect set
on which the network is differentiable but AD does not compute its derivative,
and the non-differentiable set on which the network is non-differentiable. For
a neural network with bias parameters, we first prove that the incorrect set is
always empty. We then prove a tight bound on the size of the non-differentiable
set, which is linear in the number of non-differentiabilities in activation
functions, and give a simple necessary and sufficient condition for a parameter
to be in this set. We further prove that AD always computes a Clarke
subderivative even on the non-differentiable set. We also extend these results
to neural networks possibly without bias parameters.
( 2
min )
Previous pitch-controllable text-to-speech (TTS) models rely on directly
modeling fundamental frequency, leading to low variance in synthesized speech.
To address this issue, we propose PITS, an end-to-end pitch-controllable TTS
model that utilizes variational inference to model pitch. Based on VITS, PITS
incorporates the Yingram encoder, the Yingram decoder, and adversarial training
of pitch-shifted synthesis to achieve pitch-controllability. Experiments
demonstrate that PITS generates high-quality speech that is indistinguishable
from ground truth speech and has high pitch-controllability without quality
degradation. Code, audio samples, and demo are available at
https://github.com/anonymous-pits/pits.
( 2
min )
The mechanism of existing style transfer algorithms is by minimizing a hybrid
loss function to push the generated image toward high similarities in both
content and style. However, this type of approach cannot guarantee visual
fidelity, i.e., the generated artworks should be indistinguishable from real
ones. In this paper, we devise a new style transfer framework called QuantArt
for high visual-fidelity stylization. QuantArt pushes the latent representation
of the generated artwork toward the centroids of the real artwork distribution
with vector quantization. By fusing the quantized and continuous latent
representations, QuantArt allows flexible control over the generated artworks
in terms of content preservation, style similarity, and visual fidelity.
Experiments on various style transfer settings show that our QuantArt framework
achieves significantly higher visual fidelity compared with the existing style
transfer methods.
( 2
min )
Recent developments in Deep Learning (DL) suggest a vast potential for
Topology Optimization (TO). However, while there are some promising attempts,
the subfield still lacks a firm footing regarding basic methods and datasets.
We aim to address both points. First, we explore physics-based preprocessing
and equivariant networks to create sample-efficient components for TO DL
pipelines. We evaluate them in a large-scale ablation study using end-to-end
supervised training. The results demonstrate a drastic improvement in sample
efficiency and the predictions' physical correctness. Second, to improve
comparability and future progress, we publish the two first TO datasets
containing problems and corresponding ground truth solutions.
( 2
min )
Fault diagnosis is a crucial area of research in industry. Industrial
processes exhibit diverse operating conditions, where data often have
non-Gaussian, multi-mode, and center-drift characteristics. Data-driven
approaches are currently the main focus in the field, but continuous fault
classification and parameter updates of fault classifiers pose challenges for
multiple operating modes and real-time settings. Thus, a pressing issue is to
achieve real-time multi-mode fault diagnosis in industrial systems. In this
paper, a novel approach to achieve real-time multi-mode fault diagnosis is
proposed for industrial applications, which addresses this critical research
problem. Our approach uses an extended evidence reasoning (ER) algorithm to
fuse information and merge outputs from different base classifiers. These base
classifiers based on broad learning system (BLS) are trained to ensure maximum
fault diagnosis accuracy. Furthermore, pseudo-label learning is used to update
model parameters in real-time. The effectiveness of the proposed approach is
demonstrated on the multi-mode Tennessee Eastman process dataset.
( 2
min )
We introduce a new methodology dubbed ``safe peeling'' to accelerate the
resolution of L0-regularized least-squares problems via a Branch-and-Bound
(BnB) algorithm. Our procedure enables to tighten the convex relaxation
considered at each node of the BnB decision tree and therefore potentially
allows for more aggressive pruning. Numerical simulations show that our
proposed methodology leads to significant gains in terms of number of nodes
explored and overall solving time.s show that our proposed methodology leads to
significant gains in terms of number of nodes explored and overall solving
time.
( 2
min )
Multivariate probabilistic time series forecasts are commonly evaluated via
proper scoring rules, i.e., functions that are minimal in expectation for the
ground-truth distribution. However, this property is not sufficient to
guarantee good discrimination in the non-asymptotic regime. In this paper, we
provide the first systematic finite-sample study of proper scoring rules for
time-series forecasting evaluation. Through a power analysis, we identify the
"region of reliability" of a scoring rule, i.e., the set of practical
conditions where it can be relied on to identify forecasting errors. We carry
out our analysis on a comprehensive synthetic benchmark, specifically designed
to test several key discrepancies between ground-truth and forecast
distributions, and we gauge the generalizability of our findings to real-world
tasks with an application to an electricity production problem. Our results
reveal critical shortcomings in the evaluation of multivariate probabilistic
forecasts as commonly performed in the literature.
( 2
min )
Natural language generation (NLG) is one of the most impactful fields in NLP,
and recent years have witnessed its evolution brought about by large language
models (LLMs). As the key instrument for writing assistance applications, they
are generally prone to replicating or extending offensive content provided in
the input. In low-resource data regime, they can also lead to repetitive
outputs. Usually, offensive content and repetitions are mitigated with post-hoc
methods, including n-gram level blocklists, top-k and nucleus sampling. In this
paper, we apply non-exact repetition suppression using token and sequence level
unlikelihood loss, and further explore the framework of unlikelihood training
objective in order to jointly endow the model with abilities to avoid
generating offensive words and phrases from the beginning. Finally, with
comprehensive experiments, we demonstrate that our proposed methods work
exceptionally in controlling the repetition and content quality of LLM outputs.
( 2
min )
We use a binary attribute representation (BAR) model to describe a data set
of Netflix viewers' ratings of movies. We classify the viewers with discrete
bits rather than continuous parameters, which makes the representation compact
and transparent. The attributes are easy to interpret, and we need far fewer
attributes than similar methods do to achieve the same level of error. We also
take advantage of the nonuniform distribution of ratings among the movies in
the data set to train on a small selection of movies without compromising
performance on the rest of the movies.
( 2
min )
Bilevel optimization has recently regained interest owing to its applications
in emerging machine learning fields such as hyperparameter optimization,
meta-learning, and reinforcement learning. Recent results have shown that
simple alternating (implicit) gradient-based algorithms can achieve the same
convergence rate of single-level gradient descent (GD) for bilevel problems
with a strongly convex lower-level objective. However, it remains unclear
whether this result can be generalized to bilevel problems beyond this basic
setting. In this paper, we propose a Generalized ALternating mEthod for bilevel
opTimization (GALET) with a nonconvex lower-level objective that satisfies the
Polyak-{\L}ojasiewicz (PL) condition. We first introduce a stationary metric
for the considered bilevel problems, which generalizes the existing metric. We
then establish that GALET achieves an $\epsilon$-stationary metric for the
considered problem within $\tilde{\cal O}(\epsilon^{-1})$ iterations, which
matches the iteration complexity of GD for smooth nonconvex problems.
( 2
min )
We present a novel framework for conditional sampling of probability
measures, using block triangular transport maps. We develop the theoretical
foundations of block triangular transport in a Banach space setting,
establishing general conditions under which conditional sampling can be
achieved and drawing connections between monotone block triangular maps and
optimal transport. Based on this theory, we then introduce a computational
approach, called monotone generative adversarial networks (M-GANs), to learn
suitable block triangular maps. Our algorithm uses only samples from the
underlying joint probability measure and is hence likelihood-free. Numerical
experiments with M-GAN demonstrate accurate sampling of conditional measures in
synthetic examples, Bayesian inverse problems involving ordinary and partial
differential equations, and probabilistic image in-painting.
( 2
min )
We propose causal isotonic calibration, a novel nonparametric method for
calibrating predictors of heterogeneous treatment effects. Furthermore, we
introduce cross-calibration, a data-efficient variant of calibration that
eliminates the need for hold-out calibration sets. Cross-calibration leverages
cross-fitted predictors and generates a single calibrated predictor using all
available data. Under weak conditions that do not assume monotonicity, we
establish that both causal isotonic calibration and cross-calibration achieve
fast doubly-robust calibration rates, as long as either the propensity score or
outcome regression is estimated accurately in a suitable sense. The proposed
causal isotonic calibrator can be wrapped around any black-box learning
algorithm, providing robust and distribution-free calibration guarantees while
preserving predictive performance.
( 2
min )
While black-box variational inference is widely used, there is no proof that
its stochastic optimization succeeds. We suggest this is due to a theoretical
gap in existing stochastic optimization proofs-namely the challenge of gradient
estimators with unusual noise bounds, and a composite non-smooth objective. For
dense Gaussian variational families, we observe that existing gradient
estimators based on reparameterization satisfy a quadratic noise bound and give
novel convergence guarantees for proximal and projected stochastic gradient
descent using this bound. This provides the first rigorous guarantee that
black-box variational inference converges for realistic inference problems.
( 2
min )
Personalized treatment effect estimates are often of interest in high-stakes
applications -- thus, before deploying a model estimating such effects in
practice, one needs to be sure that the best candidate from the ever-growing
machine learning toolbox for this task was chosen. Unfortunately, due to the
absence of counterfactual information in practice, it is usually not possible
to rely on standard validation metrics for doing so, leading to a well-known
model selection dilemma in the treatment effect estimation literature. While
some solutions have recently been investigated, systematic understanding of the
strengths and weaknesses of different model selection criteria is still
lacking. In this paper, instead of attempting to declare a global `winner', we
therefore empirically investigate success- and failure modes of different
selection criteria. We highlight that there is a complex interplay between
selection strategies, candidate estimators and the data used for comparing
them, and provide interesting insights into the relative (dis)advantages of
different criteria alongside desiderata for the design of further illuminating
empirical studies in this context.
( 2
min )
The Brazilian social justice reporter is a fellow at the MIT Center for International Studies.
( 11
min )
Announcements The Missing Part in LLMs and GPT-like Systems These days, all the AI talk is about GPT (Generative Pre-Trained Transformer), LLMs (Large Language Models), generative AI, prompt engineering, and related technologies. You must live alone on a small island if you have never heard these words. LLM originated from NLP (natural language processing) which… Read More »DSC Weekly 6 June 2023 – The Missing Part in LLMs and GPT-like Systems
The post DSC Weekly 6 June 2023 – The Missing Part in LLMs and GPT-like Systems appeared first on Data Science Central.
( 21
min )
PyTorch is a machine learning (ML) framework that is widely used by AWS customers for a variety of applications, such as computer vision, natural language processing, content creation, and more. With the recent PyTorch 2.0 release, AWS customers can now do same things as they could with PyTorch 1.x but faster and at scale with […]
( 15
min )
Amazon Transcribe is a speech recognition service that generates transcripts from video and audio files in multiple supported languages and accents. It comes with a rich set of features, including automatic language identification, multi-channel and multi-speaker support, custom vocabularies, and transcript redaction. Amazon Transcribe supports two modes of operation: batch and streaming. In batch mode, […]
( 7
min )
Amazon SageMaker Feature Store is a purpose-built service to store and retrieve feature data for use by machine learning (ML) models. Feature Store provides an online store capable of low-latency, high-throughput reads and writes, and an offline store that provides bulk access to all historical record data. Feature Store handles the synchronization of data between […]
( 11
min )
Dear AI Innovators,
( 6
min )
Posted by Ruofei Du, Research Scientist, and Alex Olwal, Senior Staff Research Scientist, Google Augmented Reality
Recent advances in video conferencing have significantly improved remote video communication through features like live captioning and noise cancellation. However, there are various situations where dynamic visual augmentation would be useful to better convey complex and nuanced information. For example, when discussing what to order at a Japanese restaurant, your friends could share visuals that would help you feel more confident about ordering the “Sukiyaki”. Or when talking about your recent family trip to San Francisco, you may want to show a photo from your personal album.
In “Visual Captions: Augmenting Verbal Communication With On-the-fly Visuals”, presented at …
( 93
min )
As a marine biology student, Josef Melchner always dreamed of spending his days cruising the oceans to find dolphins, whales and fish — but also “wanted to do something practical, something that would benefit the world,” he said. When it came time to choose a career, he dove head first into aquaculture. He’s now CEO Read article >
( 6
min )
Keerthan Sathya, a senior technical artist specializing in 3D, emerged trium-elephant In the NVIDIA Studio this week with the incredibly detailed, expertly constructed, jaw-droppingly beautiful animation Tiny Mammoth.
( 7
min )
Artificial intelligence has emerged as a powerful technology that can drive substantial transformations in businesses across diverse…
( 11
min )
Artificial Intelligence (AI) has emerged as a transformative technology across various industries, and banking is no exception. In recent…
( 10
min )
Web scraping is a technique used to extract data from websites. It allows us to gather information from web pages and use it for various…
( 22
min )
A new multimodal technique blends major self-supervised learning methods to learn more similarly to humans.
( 9
min )
Data is the foundation for machine learning (ML) algorithms. One of the most common formats for storing large amounts of data is Apache Parquet due to its compact and highly efficient format. This means that business analysts who want to extract insights from the large volumes of data in their data warehouse must frequently use […]
( 8
min )
Amazon SageMaker Automatic Model Tuning has introduced Autotune, a new feature to automatically choose hyperparameters on your behalf. This provides an accelerated and more efficient way to find hyperparameter ranges, and can provide significant optimized budget and time management for your automatic model tuning jobs. In this post, we discuss this new capability and some […]
( 8
min )
This post is co-written with Philipp Schmid from Hugging Face. We have all heard about the progress being made in the field of large language models (LLMs) and the ever-growing number of problem sets where LLMs are providing valuable insights. Large models, when trained over massive datasets and several tasks, are also able to generalize […]
( 13
min )
This post is co-written with Philipp Schmid and Jeff Boudier from Hugging Face. Today, as part of Amazon Web Services’ partnership with Hugging Face, we are excited to announce the release of a new Hugging Face Deep Learning Container (DLC) for inference with Large Language Models (LLMs). This new Hugging Face LLM DLC is powered […]
( 7
min )
Jiusheng Chen’s team just got accelerated. They’re delivering personalized ads to users of Microsoft Bing with 7x throughput at reduced cost, thanks to NVIDIA Triton Inference Server running on NVIDIA A100 Tensor Core GPUs. It’s an amazing achievement for the principal software engineering manager and his crew. Tuning a Complex System Bing’s ad service uses Read article >
( 4
min )
Maria Girone is expanding the world’s largest network of scientific computers with accelerated computing and AI.
( 6
min )
Ambulatory surgery centers face unique financial challenges in the fast-paced healthcare industry. With AI, ASCs can unlock untapped revenue potential. AI revolutionizes revenue cycles, optimizes billing processes, and drives significant financial growth in ASCs. Healthcare is slower to adopt new technologies than manufacturing and retail. In our blog “Must Have Medical Practice Technologies to Boost… Read More »AI As A Catalyst For Financial Success In ASCs: Unlocking Revenue Potential
The post AI As A Catalyst For Financial Success In ASCs: Unlocking Revenue Potential appeared first on Data Science Central.
( 21
min )
Navigation is a complex skill with a long history of research in animals and
humans. In this work, we simulate the Morris Water Maze in 2D to train deep
reinforcement learning agents. We perform automatic classification of
navigation strategies, analyze the distribution of strategies used by
artificial agents, and compare them with experimental data to show similar
learning dynamics as those seen in humans and rodents. We develop
environment-specific auxiliary tasks and examine factors affecting their
usefulness. We suggest that the most beneficial tasks are potentially more
biologically feasible for real agents to use. Lastly, we explore the
development of internal representations in the activations of artificial agent
neural networks. These representations resemble place cells and head-direction
cells found in mouse brains, and their presence has correlation to the
navigation strategies that artificial agents employ.
( 2
min )
Generative AI models have recently achieved astonishing results in quality
and are consequently employed in a fast-growing number of applications.
However, since they are highly data-driven, relying on billion-sized datasets
randomly scraped from the internet, they also suffer from degenerated and
biased human behavior, as we demonstrate. In fact, they may even reinforce such
biases. To not only uncover but also combat these undesired effects, we present
a novel strategy, called Fair Diffusion, to attenuate biases after the
deployment of generative text-to-image models. Specifically, we demonstrate
shifting a bias, based on human instructions, in any direction yielding
arbitrarily new proportions for, e.g., identity groups. As our empirical
evaluation demonstrates, this introduced control enables instructing generative
image models on fairness, with no data filtering and additional training
required.
( 2
min )
We consider deep neural networks with a Lipschitz continuous activation
function and with weight matrices of variable widths. We establish a uniform
convergence analysis framework in which sufficient conditions on weight
matrices and bias vectors together with the Lipschitz constant are provided to
ensure uniform convergence of the deep neural networks to a meaningful function
as the number of their layers tends to infinity. In the framework, special
results on uniform convergence of deep neural networks with a fixed width,
bounded widths and unbounded widths are presented. In particular, as
convolutional neural networks are special deep neural networks with weight
matrices of increasing widths, we put forward conditions on the mask sequence
which lead to uniform convergence of resulting convolutional neural networks.
The Lipschitz continuity assumption on the activation functions allows us to
include in our theory most of commonly used activation functions in
applications.
( 2
min )
Matching algorithms are commonly used to predict matches between items in a
collection. For example, in 1:1 face verification, a matching algorithm
predicts whether two face images depict the same person. Accurately assessing
the uncertainty of the error rates of such algorithms can be challenging when
data are dependent and error rates are low, two aspects that have been often
overlooked in the literature. In this work, we review methods for constructing
confidence intervals for error rates in matching tasks such as 1:1 face
verification. We derive and examine the statistical properties of these methods
and demonstrate how coverage and interval width vary with sample size, error
rates, and degree of data dependence using both synthetic and real-world
datasets. Based on our findings, we provide recommendations for best practices
for constructing confidence intervals for error rates in matching tasks.
( 2
min )
Neural networks are powerful functions with widespread use, but the
theoretical behaviour of these functions is not fully understood. Creating deep
neural networks by stacking many layers has achieved exceptional performance in
many applications and contributed to the recent explosion of these methods.
Previous works have shown that depth can exponentially increase the
expressibility of the network. However, as networks get deeper and deeper, they
are more susceptible to becoming degenerate. We observe this degeneracy in the
sense that on initialization, inputs tend to become more and more correlated as
they travel through the layers of the network. If a network has too many
layers, it tends to approximate a (random) constant function, making it
effectively incapable of distinguishing between inputs. This seems to affect
the training of the network and cause it to perform poorly, as we empirically
investigate in this paper. We use a simple algorithm that can accurately
predict the level of degeneracy for any given fully connected ReLU network
architecture, and demonstrate how the predicted degeneracy relates to training
dynamics of the network. We also compare this prediction to predictions derived
using infinite width networks.
( 2
min )
Recent work on mini-batch consistency (MBC) for set functions has brought
attention to the need for sequentially processing and aggregating chunks of a
partitioned set while guaranteeing the same output for all partitions. However,
existing constraints on MBC architectures lead to models with limited
expressive power. Additionally, prior work has not addressed how to deal with
large sets during training when the full set gradient is required. To address
these issues, we propose a Universally MBC (UMBC) class of set functions which
can be used in conjunction with arbitrary non-MBC components while still
satisfying MBC, enabling a wider range of function classes to be used in MBC
settings. Furthermore, we propose an efficient MBC training algorithm which
gives an unbiased approximation of the full set gradient and has a constant
memory overhead for any set size for both train- and test-time. We conduct
extensive experiments including image completion, text classification,
unsupervised clustering, and cancer detection on high-resolution images to
verify the efficiency and efficacy of our scalable set encoding framework.
( 2
min )
A prediscretisation of numerical attributes which is required by some rule
learning algorithms is a source of inefficiencies. This paper describes new
rule tuning steps that aim to recover lost information in the discretisation
and new pruning techniques that may further reduce the size of rule models and
improve their accuracy. The proposed QCBA method was initially developed to
postprocess quantitative attributes in models generated by the Classification
based on associations (CBA) algorithm, but it can also be applied to the
results of other rule learning approaches. We demonstrate the effectiveness on
the postprocessing of models generated by five association rule classification
algorithms (CBA, CMAR, CPAR, IDS, SBRL) and two first-order logic rule learners
(FOIL2 and PRM). Benchmarks on 22 datasets from the UCI repository show smaller
size and the overall best predictive performance for FOIL2+QCBA compared to all
seven baselines. Postoptimised CBA models have a better predictive performance
compared to the state-of-the-art rule learner CORELS in this benchmark. The
article contains an ablation study for the individual postprocessing steps and
a scalability analysis on the KDD'99 Anomaly detection dataset.
( 2
min )
Machine Learning (ML) algorithms are vulnerable to poisoning attacks, where a
fraction of the training data is manipulated to deliberately degrade the
algorithms' performance. Optimal attacks can be formulated as bilevel
optimization problems and help to assess their robustness in worst-case
scenarios. We show that current approaches, which typically assume that
hyperparameters remain constant, lead to an overly pessimistic view of the
algorithms' robustness and of the impact of regularization. We propose a novel
optimal attack formulation that considers the effect of the attack on the
hyperparameters and models the attack as a multiobjective bilevel optimization
problem. This allows to formulate optimal attacks, learn hyperparameters and
evaluate robustness under worst-case conditions. We apply this attack
formulation to several ML classifiers using $L_2$ and $L_1$ regularization. Our
evaluation on multiple datasets confirms the limitations of previous strategies
and evidences the benefits of using $L_2$ and $L_1$ regularization to dampen
the effect of poisoning attacks.
( 2
min )
The relational data model was designed to facilitate large-scale data
management and analytics. We consider the problem of how to differentiate
computations expressed relationally. We show experimentally that a relational
engine running an auto-differentiated relational algorithm can easily scale to
very large datasets, and is competitive with state-of-the-art, special-purpose
systems for large-scale distributed machine learning.
( 2
min )
We introduce an efficient and robust auto-tuning framework for hyperparameter
selection in dimension reduction (DR) algorithms, focusing on large-scale
datasets and arbitrary performance metrics. By leveraging Bayesian optimization
(BO) with a surrogate model, our approach enables efficient hyperparameter
selection with multi-objective trade-offs and allows us to perform data-driven
sensitivity analysis. By incorporating normalization and subsampling, the
proposed framework demonstrates versatility and efficiency, as shown in
applications to visualization techniques such as t-SNE and UMAP. We evaluate
our results on various synthetic and real-world datasets using multiple quality
metrics, providing a robust and efficient solution for hyperparameter selection
in DR algorithms.
( 2
min )
Scaling methods have long been utilized to simplify and cluster
high-dimensional data. However, the general latent spaces across all predefined
groups derived from these methods sometimes do not fall into researchers'
interest regarding specific patterns within groups. To tackle this issue, we
adopt an emerging analysis approach called contrastive learning. We contribute
to this growing field by extending its ideas to multiple correspondence
analysis (MCA) in order to enable an analysis of data often encountered by
social scientists -- containing binary, ordinal, and nominal variables. We
demonstrate the utility of contrastive MCA (cMCA) by analyzing two different
surveys of voters in the U.S. and U.K. Our results suggest that, first, cMCA
can identify substantively important dimensions and divisions among subgroups
that are overlooked by traditional methods; second, for other cases, cMCA can
derive latent traits that emphasize subgroups seen moderately in those derived
by traditional methods.
( 2
min )
The strength of modern generative models lies in their ability to be
controlled through text-based prompts. Typical "hard" prompts are made from
interpretable words and tokens, and must be hand-crafted by humans. There are
also "soft" prompts, which consist of continuous feature vectors. These can be
discovered using powerful optimization methods, but they cannot be easily
interpreted, re-used across models, or plugged into a text-based interface.
We describe an approach to robustly optimize hard text prompts through
efficient gradient-based optimization. Our approach automatically generates
hard text-based prompts for both text-to-image and text-to-text applications.
In the text-to-image setting, the method creates hard prompts for diffusion
models, allowing API users to easily generate, discover, and mix and match
image concepts without prior knowledge on how to prompt the model. In the
text-to-text setting, we show that hard prompts can be automatically discovered
that are effective in tuning LMs for classification.
( 2
min )
We propose a novel method to optimize the structure of factor graphs for
graph-based inference. As an example inference task, we consider symbol
detection on linear inter-symbol interference channels. The factor graph
framework has the potential to yield low-complexity symbol detectors. However,
the sum-product algorithm on cyclic factor graphs is suboptimal and its
performance is highly sensitive to the underlying graph. Therefore, we optimize
the structure of the underlying factor graphs in an end-to-end manner using
machine learning. For that purpose, we transform the structural optimization
into a clustering problem of low-degree factor nodes that incorporates the
known channel model into the optimization. Furthermore, we study the
combination of this approach with neural belief propagation, yielding
near-maximum a posteriori symbol detection performance for specific channels.
( 2
min )
Recent advances to combine structured regression models and deep neural
networks for better interpretability, more expressiveness, and statistically
valid uncertainty quantification demonstrate the versatility of semi-structured
neural networks (SSNs). We show that techniques to properly identify the
contributions of the different model components in SSNs, however, lead to
suboptimal network estimation, slower convergence, and degenerated or erroneous
predictions. In order to solve these problems while preserving favorable model
properties, we propose a non-invasive post-hoc orthogonalization (PHO) that
guarantees identifiability of model components and provides better estimation
and prediction quality. Our theoretical findings are supported by numerical
experiments, a benchmark comparison as well as a real-world application to
COVID-19 infections.
( 2
min )
We propose an efficient algorithm for matching two correlated
Erd\H{o}s--R\'enyi graphs with $n$ vertices whose edges are correlated through
a latent vertex correspondence. When the edge density $q= n^{- \alpha+o(1)}$
for a constant $\alpha \in [0,1)$, we show that our algorithm has polynomial
running time and succeeds to recover the latent matching as long as the edge
correlation is non-vanishing. This is closely related to our previous work on a
polynomial-time algorithm that matches two Gaussian Wigner matrices with
non-vanishing correlation, and provides the first polynomial-time random graph
matching algorithm (regardless of the regime of $q$) when the edge correlation
is below the square root of the Otter's constant (which is $\approx 0.338$).
( 2
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )